mirror of https://fastgit.cc/github.com/Michael-A-Kuykendall/shimmy synced 2026-04-21 13:23:05 +08:00

Files

Michael A. Kuykendall 65298f767e feat(mlx): implement native Apple Silicon MLX support with pre-commit quality gates

- Add comprehensive MLX engine implementation with Python MLX bindings
- Implement MLX model discovery, loading, and native inference pipeline
- Add MLX feature flag compilation and Apple Silicon hardware detection
- Create dedicated GitHub Actions workflow for MLX testing on macos-14 ARM64
- Add MLX documentation to README and wiki with capability descriptions
- Implement pre-commit hooks enforcing cargo fmt, clippy, and test validation
- Fix GPU backend tests to properly force specific backends instead of auto-detection
- Resolve property test race conditions with serial test execution
- Update release workflow validation and platform-specific test expectations
- Add MLX implementation plan and cross-compilation toolchain support

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-09 20:11:32 -05:00

4.6 KiB

Raw Permalink Blame History

Shimmy Constitutional Principles

These principles are immutable and govern all development decisions for the Shimmy inference engine.

Article I: The Lightweight Binary Mandate

IMMUTABLE: Shimmy's core binary shall remain lightweight and focused, never exceeding 5MB. This preserves our fundamental competitive advantage over bloated alternatives (Ollama: 680MB+). At 4.8MB, Shimmy remains 142x smaller than Ollama. Target: stay under 5MB for full-featured deployment. All features must be designed for minimal binary impact while delivering maximum value.

Article II: Library-First Architecture

REQUIREMENT: Every feature must begin as a standalone, reusable library before integration into Shimmy core. This ensures modularity, testability, and prevents architectural debt.

Article III: CLI Interface Mandate

REQUIREMENT: All functionality must be accessible via command-line interface. Shimmy serves as a universal shim - CLI access ensures programmatic integration and automation capabilities.

Article IV: Test-First Imperative

REQUIREMENT: No implementation shall proceed without comprehensive test specifications. Use cargo test --all-features as the validation standard. Integration tests must pass before any feature is considered complete.

Article V: Startup Speed Supremacy

IMMUTABLE: Shimmy must maintain sub-2-second startup time. This 2-5x speed advantage over alternatives is core to our value proposition. Any feature that degrades startup performance is rejected.

Article VI: Zero Python Dependencies

IMMUTABLE: Shimmy's core shall remain free of Python runtime dependencies. Native Rust implementations only. Python integrations may exist as optional, external components but never as required dependencies.

Article VII: API Compatibility Preservation

REQUIREMENT: OpenAI API compatibility must be maintained. Shimmy serves as a drop-in replacement - breaking this compatibility breaks our fundamental promise to users.

Article VIII: Integration-First Testing

REQUIREMENT: Testing must prioritize real-world scenarios over mocks. Use actual model files, real HTTP requests, and genuine client integrations wherever possible.

Article IX: Specification-Driven Development

REQUIREMENT: All new features must follow GitHub Spec-Kit methodology:

/specify - Create detailed specification
/plan - Generate implementation plan
/tasks - Break into actionable items
Implementation with continuous validation

Constitutional Enforcement

Version Control Integration

All pull requests must reference constitutional compliance
Breaking changes require constitutional amendment process
Major version bumps require constitutional review

Feature Acceptance Criteria

✅ Preserves lightweight binary constraint (≤5MB)
✅ Maintains sub-2-second startup
✅ Zero new Python dependencies
✅ CLI interface provided
✅ Comprehensive test coverage
✅ OpenAI API compatibility maintained
✅ Specification-driven development followed

Emergency Constitutional Overrides

In exceptional circumstances, constitutional principles may be temporarily suspended only by:

Explicit human approval from project maintainer
Documented justification for the override
Clear remediation timeline
Constitutional compliance restoration plan

Architectural Principles

Performance Hierarchy

Startup Speed (non-negotiable)
Binary Efficiency (lightweight constraint ≤5MB)
Inference Throughput (optimize within constraints)
Feature Richness (only if compatible with above)

Technology Stack Constraints

Core Language: Rust (immutable)
HTTP Framework: Axum (current standard)
Model Formats: SafeTensors (native), GGUF (via llama.cpp), HuggingFace (optional)
GPU Support: Multiple vendors (NVIDIA, AMD, Intel)
Platform Support: Cross-platform (Windows, Linux, macOS)

Development Workflow

Methodology: GitHub Spec-Kit driven
Testing: cargo test --all-features
Documentation: Specification-first
Integration: Library-first modular design

Constitutional violations will result in immediate development halt and architectural review.

Last Updated: September 18, 2025 Version: 1.1 - Developer Expansion Update

Amendment History

v1.1 (Sept 18, 2025): Temporarily expanded binary size constraint to ≤20MB to support developer expansion features
v1.2 (Oct 3, 2025): Returned to sub-5MB constraint after removing bloat; actual size 4.8MB maintains 142x competitive advantage over Ollama
v1.0 (Sept 17, 2025): Initial constitutional framework

4.6 KiB Raw Permalink Blame History