- Add comprehensive MLX engine implementation with Python MLX bindings - Implement MLX model discovery, loading, and native inference pipeline - Add MLX feature flag compilation and Apple Silicon hardware detection - Create dedicated GitHub Actions workflow for MLX testing on macos-14 ARM64 - Add MLX documentation to README and wiki with capability descriptions - Implement pre-commit hooks enforcing cargo fmt, clippy, and test validation - Fix GPU backend tests to properly force specific backends instead of auto-detection - Resolve property test race conditions with serial test execution - Update release workflow validation and platform-specific test expectations - Add MLX implementation plan and cross-compilation toolchain support 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Shimmy Cloud Deployment
One-click deployment configurations for popular cloud platforms.
Quick Deploy Buttons
Railway
Render
Fly.io
# Install flyctl and deploy
curl -L https://fly.io/install.sh | sh
fly deploy
Docker (Any Platform)
# Local development
docker-compose up
# Production with Nginx
docker-compose --profile production up
Platform-Specific Instructions
Railway.app
- Click the "Deploy on Railway" button above
- Connect your GitHub account
- Fork this repository
- Railway will automatically build and deploy
- Your Shimmy instance will be available at
https://your-app.railway.app
Render.com
- Click the "Deploy to Render" button above
- Connect your GitHub repository
- Render will use the
render.yamlconfiguration - Your service will be available with automatic HTTPS
Fly.io
- Install the Fly CLI:
curl -L https://fly.io/install.sh | sh - Clone this repository:
git clone https://github.com/Michael-A-Kuykendall/shimmy.git - Navigate to the project:
cd shimmy - Create and deploy:
fly deploy - Access your app:
fly open
Google Cloud Run
# Build and deploy to Cloud Run
gcloud builds submit --tag gcr.io/PROJECT-ID/shimmy
gcloud run deploy --image gcr.io/PROJECT-ID/shimmy --platform managed
AWS App Runner
- Create
apprunner.yamlin your repository root:
version: 1.0
runtime: docker
build:
commands:
build:
- echo "Building Shimmy with Docker"
run:
runtime-version: latest
command: shimmy serve --bind 0.0.0.0:8080
network:
port: 8080
DigitalOcean App Platform
- Create app via DigitalOcean control panel
- Connect your GitHub repository
- DigitalOcean will detect the Dockerfile automatically
- Set environment variables as needed
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT |
11434 |
Port to bind the server |
RUST_LOG |
info |
Log level (error, warn, info, debug, trace) |
SHIMMY_BIND |
0.0.0.0:11434 |
Full bind address |
Resource Requirements
Minimal
- CPU: 0.5 vCPU
- Memory: 512MB RAM
- Storage: 100MB (binary only)
Recommended
- CPU: 1 vCPU
- Memory: 1GB RAM
- Storage: 1GB+ (for model caching)
High Performance
- CPU: 2+ vCPU
- Memory: 4GB+ RAM
- Storage: 10GB+ SSD
Security Considerations
-
Authentication: Shimmy doesn't include built-in authentication. Use a reverse proxy (Nginx, Cloudflare) for auth.
-
Rate Limiting: The included Nginx configuration has basic rate limiting. Adjust as needed.
-
HTTPS: Most cloud platforms provide automatic HTTPS. For self-hosted deployments, configure SSL certificates.
-
Firewall: Only expose port 11434 (or your configured port) to the public internet.
Monitoring
Health Checks
All configurations include health checks at /health endpoint.
Logs
Set RUST_LOG=debug for detailed logging. Most platforms provide log aggregation.
Metrics
For production deployments, consider adding:
- Prometheus metrics
- Jaeger tracing
- Custom monitoring dashboards
Scaling
Horizontal Scaling
Shimmy is stateless and can be horizontally scaled. Use a load balancer to distribute requests.
Vertical Scaling
For better performance with large models:
- Increase memory for model caching
- Add more CPU cores for parallel processing
- Use SSD storage for faster model loading
Troubleshooting
Common Issues
- Out of Memory: Increase memory allocation or use memory-mapped loading
- Slow Startup: Enable model caching and use persistent storage
- Connection Timeout: Increase proxy timeout settings for large model inference
Debug Mode
# Enable debug logging
RUST_LOG=debug shimmy serve
Container Debugging
# Access running container
docker exec -it shimmy-container /bin/bash
# Check logs
docker logs shimmy-container