- Add comprehensive MLX engine implementation with Python MLX bindings - Implement MLX model discovery, loading, and native inference pipeline - Add MLX feature flag compilation and Apple Silicon hardware detection - Create dedicated GitHub Actions workflow for MLX testing on macos-14 ARM64 - Add MLX documentation to README and wiki with capability descriptions - Implement pre-commit hooks enforcing cargo fmt, clippy, and test validation - Fix GPU backend tests to properly force specific backends instead of auto-detection - Resolve property test race conditions with serial test execution - Update release workflow validation and platform-specific test expectations - Add MLX implementation plan and cross-compilation toolchain support 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
4.6 KiB
Shimmy Constitutional Principles
These principles are immutable and govern all development decisions for the Shimmy inference engine.
Article I: The Lightweight Binary Mandate
IMMUTABLE: Shimmy's core binary shall remain lightweight and focused, never exceeding 5MB. This preserves our fundamental competitive advantage over bloated alternatives (Ollama: 680MB+). At 4.8MB, Shimmy remains 142x smaller than Ollama. Target: stay under 5MB for full-featured deployment. All features must be designed for minimal binary impact while delivering maximum value.
Article II: Library-First Architecture
REQUIREMENT: Every feature must begin as a standalone, reusable library before integration into Shimmy core. This ensures modularity, testability, and prevents architectural debt.
Article III: CLI Interface Mandate
REQUIREMENT: All functionality must be accessible via command-line interface. Shimmy serves as a universal shim - CLI access ensures programmatic integration and automation capabilities.
Article IV: Test-First Imperative
REQUIREMENT: No implementation shall proceed without comprehensive test specifications. Use cargo test --all-features as the validation standard. Integration tests must pass before any feature is considered complete.
Article V: Startup Speed Supremacy
IMMUTABLE: Shimmy must maintain sub-2-second startup time. This 2-5x speed advantage over alternatives is core to our value proposition. Any feature that degrades startup performance is rejected.
Article VI: Zero Python Dependencies
IMMUTABLE: Shimmy's core shall remain free of Python runtime dependencies. Native Rust implementations only. Python integrations may exist as optional, external components but never as required dependencies.
Article VII: API Compatibility Preservation
REQUIREMENT: OpenAI API compatibility must be maintained. Shimmy serves as a drop-in replacement - breaking this compatibility breaks our fundamental promise to users.
Article VIII: Integration-First Testing
REQUIREMENT: Testing must prioritize real-world scenarios over mocks. Use actual model files, real HTTP requests, and genuine client integrations wherever possible.
Article IX: Specification-Driven Development
REQUIREMENT: All new features must follow GitHub Spec-Kit methodology:
/specify- Create detailed specification/plan- Generate implementation plan/tasks- Break into actionable items- Implementation with continuous validation
Constitutional Enforcement
Version Control Integration
- All pull requests must reference constitutional compliance
- Breaking changes require constitutional amendment process
- Major version bumps require constitutional review
Feature Acceptance Criteria
- ✅ Preserves lightweight binary constraint (≤5MB)
- ✅ Maintains sub-2-second startup
- ✅ Zero new Python dependencies
- ✅ CLI interface provided
- ✅ Comprehensive test coverage
- ✅ OpenAI API compatibility maintained
- ✅ Specification-driven development followed
Emergency Constitutional Overrides
In exceptional circumstances, constitutional principles may be temporarily suspended only by:
- Explicit human approval from project maintainer
- Documented justification for the override
- Clear remediation timeline
- Constitutional compliance restoration plan
Architectural Principles
Performance Hierarchy
- Startup Speed (non-negotiable)
- Binary Efficiency (lightweight constraint ≤5MB)
- Inference Throughput (optimize within constraints)
- Feature Richness (only if compatible with above)
Technology Stack Constraints
- Core Language: Rust (immutable)
- HTTP Framework: Axum (current standard)
- Model Formats: SafeTensors (native), GGUF (via llama.cpp), HuggingFace (optional)
- GPU Support: Multiple vendors (NVIDIA, AMD, Intel)
- Platform Support: Cross-platform (Windows, Linux, macOS)
Development Workflow
- Methodology: GitHub Spec-Kit driven
- Testing:
cargo test --all-features - Documentation: Specification-first
- Integration: Library-first modular design
Constitutional violations will result in immediate development halt and architectural review.
Last Updated: September 18, 2025 Version: 1.1 - Developer Expansion Update
Amendment History
- v1.1 (Sept 18, 2025): Temporarily expanded binary size constraint to ≤20MB to support developer expansion features
- v1.2 (Oct 3, 2025): Returned to sub-5MB constraint after removing bloat; actual size 4.8MB maintains 142x competitive advantage over Ollama
- v1.0 (Sept 17, 2025): Initial constitutional framework