321 Commits

Author SHA1 Message Date
Michael A. Kuykendall
d5179f8f06 chore: allow release recreate in workflow v1.8.2 2025-12-09 11:33:14 -06:00
Michael A. Kuykendall
c931a4e7cb fix: allow release reruns when crates exist 2025-12-09 11:12:08 -06:00
Michael A. Kuykendall
6129cbca0e chore: allow GHCR push in release workflow 2025-12-09 10:26:36 -06:00
Michael A. Kuykendall
c46b9d3a4c fix: copy templates into deploy image 2025-12-09 10:05:48 -06:00
Michael A. Kuykendall
13fca83f5b chore: bump release docker toolchain 2025-12-09 09:44:20 -06:00
Michael A. Kuykendall
86043adc60 chore: prep 1.8.2 release 2025-12-09 09:13:44 -06:00
Michael A. Kuykendall
b4ab7cec2c ci: lowercase ghcr repo tags v1.8.1 2025-12-08 18:43:39 -06:00
Michael A. Kuykendall
2d56add224 chore: update Cargo.lock for 1.8.1 2025-12-08 17:37:28 -06:00
Michael A. Kuykendall
3b6a5de8f5 Release workflow: tolerate Cargo.lock changes; always run Docker publish step (avoid skipped GHCR push)
- Use --allow-dirty for crates.io publish to avoid transient Cargo.lock blocking release
- Add git status/diff debug logs for easier triage
- Run Docker publish step even if crates.io publish fails (if: always())
2025-12-08 16:19:59 -06:00
Michael A. Kuykendall
387cadf836 Release v1.8.1: Fix Docker publishing pipeline
- Added Docker image publishing to GHCR in release workflow
- Fixed issue #146: Docker images now published automatically
- Added proper GHCR authentication and multi-tag publishing
- Both versioned and latest Docker tags now available
2025-12-08 15:47:27 -06:00
Michael A. Kuykendall
7ab708be89 Release v1.8.0: Docker publishing pipeline fix
- Fixed Issue #146: Docker image publishing pipeline failures
- Added automated Docker Hub publishing to release workflow
- Enhanced release gates with Docker build validation
- Improved containerized deployment reliability
v1.8.0
2025-12-08 15:25:40 -06:00
Mike Kuykendall
3697ab21ec Fix Issue #139: Unicode character handling in token streaming (#159)
- Fixed stop token truncation to respect UTF-8 character boundaries
- Prevents FromUtf8Error when multi-byte characters are split during truncation
- Added regression tests for Unicode handling in token streaming
- Ensures callbacks receive valid UTF-8 strings

The issue was that stop token removal could truncate strings in the middle
of multi-byte Unicode characters (like emojis), causing UTF-8 decoding errors.
Now we find the proper character boundary before the stop token.
2025-12-08 08:59:34 -06:00
Mike Kuykendall
70bdc0611d style: format code with cargo fmt (#158) 2025-12-08 08:23:23 -06:00
Mike Kuykendall
1cc95c071e fix: group sharded model files in auto-discovery (#147) (#157)
* fix: group sharded model files in auto-discovery (#147)

- Add regex-based detection for sharded model patterns (model-XXXXX-of-XXXXX.ext)
- Group multiple sharded files into single model entries with aggregated size
- Display sharded models with count of additional files (e.g., '+3 more files')
- Use directory name as model name for grouped sharded models
- Add comprehensive regression tests for sharded model grouping scenarios
- Clean up unused imports and ensure zero warnings

This resolves the issue where sharded SafeTensors models were listed as
individual entries instead of being grouped together as a single model.

* style: format code with cargo fmt

* fix: remove unused variable prefixes in AMD GPU detection test

- Fixed compilation error where variables were prefixed with underscore but used without underscore
- This was causing test failures during release gate validation
2025-12-07 19:59:29 -06:00
Mike Kuykendall
18bff586e1 Fix #142: AMD GPU detection on Windows (#156)
* Fix #142: AMD GPU detection on Windows

- Add configure_gpu_environment() method to set GGML_* environment variables before backend initialization
- Fix GPU layer assignment by ensuring OpenCL/Vulkan/CUDA backends have proper environment setup
- Add comprehensive regression tests for all GPU backend environment configuration
- Fix compiler warnings with documented #[allow(dead_code)] attributes for conditionally-used MoeConfig fields

Root cause: GPU backends require environment variables set before llama.cpp initialization.
AMD GPUs detected by clinfo but layers assigned to CPU due to missing GGML_OPENCL/GGML_VULKAN variables.

Tests: All 6 release gates pass with zero warnings. Regression tests added for GPU backend validation.

* Fix compiler warnings in issue 142 regression test

- Prefix unused variables with underscores to suppress warnings
- Variables are conditionally used based on feature flags
2025-12-07 18:18:18 -06:00
Mike Kuykendall
784e56ea03 Fix OpenAI API compatibility for frontend integration (Issue #113) (#155)
- Remove extra fields (root, parent, permission) from /v1/models response
- Create separate ListModel struct for models endpoint that matches OpenAI spec exactly
- Update created timestamp to use proper system time instead of 0
- Update all tests to use the new ListModel structure

Root cause: Frontend applications like Open WebUI and AnythingLLM expect strict OpenAI API compliance. The extra fields in shimmy's Model struct were causing them to reject the API as incompatible.

Fix ensures /v1/models returns only: id, object, created, owned_by - matching OpenAI specification exactly.
2025-12-07 17:27:23 -06:00
Mike Kuykendall
3423a1cbb2 Fix Docker build failure (Issue #152) (#154)
- Add missing COPY benches/ ./benches/ to Dockerfile
- Update Rust version from 1.75-slim to 1.85-slim for lock file compatibility
- Add libclang-dev and cmake build dependencies
- Add Docker build regression test to release gates

Root cause: Dockerfile was missing benches/ directory copy, causing Cargo manifest parsing to fail. Also missing build dependencies for llama.cpp-sys compilation.
2025-12-07 16:05:27 -06:00
Michael A. Kuykendall
79b2c158e4 fix(release): auto-update Cargo.lock to prevent crates.io failures
CRITICAL FIX - Prevents recurring release blocker:
- Added pre-commit hook to auto-update Cargo.lock when Cargo.toml changes
- Added pre-flight check to dry-run-release.sh to catch uncommitted Cargo.lock
- Committed current Cargo.lock with version 1.7.4

This has killed multiple release attempts. Never again.

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
v1.7.4
2025-10-23 10:16:25 -05:00
Michael A. Kuykendall
3bf20db2f5 fix(ci): remove unused macOS config from Cross.toml
- Cross.toml only used for ARM64 Linux builds
- Removed aarch64-apple-darwin config causing warnings
- macOS builds use native cargo, not cross
- Fixes ARM64 Linux release builds

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-23 09:50:49 -05:00
Michael A. Kuykendall
85093eb93f fix(ci): silence cross compilation warnings for ARM64 builds
- Add CROSS_NO_WARNINGS=1 to build environment
- Prevents unused Cross.toml key warnings from failing builds
- Cross.toml has macOS-specific config that warns on Linux ARM64 builds
- Fixes Issue #131 ARM64 release builds

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-23 09:34:27 -05:00
Michael A. Kuykendall
1235a7b345 chore: bump version to 1.7.4
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-23 08:20:21 -05:00
Michael A. Kuykendall
5e715e51a0 fix(release): Gate 4 now validates huggingface binary size (2.6MB) not CUDA binary (26MB)
- CUDA binaries include static CUDA runtime libraries (~26MB)
- Huggingface binaries are much smaller (~2.6MB)
- Gate 4 now rebuilds huggingface binary after Gate 2 CUDA build
- This ensures we validate the correct binary size for releases
- Fixes false-positive size limit failures in local validation

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 20:03:18 -05:00
Michael A. Kuykendall
9458111349 Merge feat/issue-131-arm64-ci-support: ARM64 Linux CI/CD support (Issue #131)
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 18:10:10 -05:00
Michael A. Kuykendall
a7ddf725b9 feat: add pre-commit hooks for fast quality checks
Added .pre-commit-config.yaml with 3 fast hooks:
1. cargo fmt (auto-formatting)
2. cargo clippy (code quality)
3. cargo check (compile validation)

Updated scripts/dry-run-release.sh:
- Added pre-flight format check (catches formatting before gates)
- This catches issues like the one we just hit

Removed duplicate scripts/validate-release.sh:
- Unnecessary duplication of dry-run-release.sh

Workflow separation:
- Pre-commit: Fast checks only (~5-10s)
- PR/Release: Full validation including 115 regression tests (~60-100s)

Regression tests run on:
- Pull requests (GitHub Actions CI)
- Release workflow (dry-run-release.sh Gate 5)
- Manual: cargo test --all-features

Setup: ./scripts/setup-precommit.sh

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 18:08:42 -05:00
Michael A. Kuykendall
9ea9a56db9 feat: add local release validation script matching CI
Created scripts/validate-release.sh to catch CI failures locally BEFORE push.

Mirrors CI checks exactly:
1. cargo fmt -- --check (formatting)
2. cargo clippy (code quality)
3. cargo test --all-features (all tests + 115 regression tests)
4. cargo deny check (security audit)
5. cargo build --release (release build)

Run before every push to avoid CI surprises.

This would have caught the formatting issue we just hit.

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 17:56:10 -05:00
Michael A. Kuykendall
75302e3f28 style: apply cargo fmt (auto-indentation after module rename)
Cargo fmt automatically indented test functions after renaming
mod issue_132_auto_stop_tokens to mod tests. This was not caught
locally because we didn't run 'cargo fmt -- --check' before push.

Root cause: Incomplete local validation workflow missing format check.

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 17:47:40 -05:00
Michael A. Kuykendall
b5813e5144 fix: resolve all clippy warnings for release quality
- Fix module_inception: Renamed inner modules from issue_XXX to tests
  * tests/regression/issue_130_gpu_layer_offloading.rs
  * tests/regression/issue_129_precompiled_gpu_support.rs
  * tests/regression/issue_132_auto_stop_tokens.rs
- Remove unnecessary mut modifiers (field_reassign_with_default)
  * tests/regression/issue_132_auto_stop_tokens.rs (2 locations)
- Fix module structure: Move doc comments outside mod block

All clippy checks now pass with -D warnings.

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 17:07:07 -05:00
Michael A. Kuykendall
c9802d2632 fix: resolve all test failures from Issue #132 merge conflict
Issue #132 added stop field to ChatCompletionRequest and stop_tokens to GenOptions.
Test code wasn't updated, causing CI PPT Contract Tests to fail.

Changes:
1. Added stop: None to 13 ChatCompletionRequest test initializers:
   - src/openai_compat.rs (7 locations)
   - tests/openai_api_real_tests.rs (6 locations)
   - tests/api_error_handling_test.rs (1 location)

2. Added stop_tokens: Vec::new() to 2 GenOptions test initializers:
   - src/engine/universal.rs (1 location)
   - src/engine/huggingface.rs (1 location)

3. Fixed release.yml gate numbering inconsistency:
   - Changed GATE 6/8 -> GATE 6/7
   - Changed GATE 7/8 -> GATE 7/7
   (There are only 7 gates total, not 8)

4. Fixed workflow.rs to validate output steps exist:
   - Now checks that all requested outputs reference actual steps
   - Sets success=false when output steps are missing
   - Provides error message listing missing steps

All 445 tests now passing (3 ignored).
Release gate validation: PASS (10/10)

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 16:39:44 -05:00
Michael A. Kuykendall
126030b61d feat: add ARM64 Linux support to CI/CD (Issue #131)
Problem:
- User @Slach requested ARM64 Linux support for NVIDIA DXG Spark platform
- Release workflow only built x86_64 binaries
- ARM64 users had to build from source manually

Solution:
- Added aarch64-unknown-linux-gnu target to release workflow matrix
- Configured cross-rs for ARM64 cross-compilation on x86_64 runners
- Added shimmy-linux-aarch64 binary to release artifacts
- Used huggingface,llama features (CPU-only) for ARM64 builds

Implementation Details:
- Matrix entry: os=ubuntu-latest, target=aarch64-unknown-linux-gnu, use-cross=true
- Install cross tool conditionally when use-cross flag is set
- Build command checks use-cross flag and uses 'cross' instead of 'cargo'
- Release artifacts now include shimmy-linux-aarch64 alongside existing platforms

Testing:
- Added 11 regression tests in tests/regression/issue_131_arm64_ci_support.rs
- Tests verify: ARM64 target, cross-compilation config, artifact upload, naming
- All 115 regression tests passing
- Build:  Clippy:  Format: 

Platforms Now Supported:
- Linux x86_64 (existing)
- Linux ARM64 (NEW - Issue #131)
- Windows x86_64 (existing)
- macOS Intel (existing)
- macOS ARM64/Apple Silicon (existing)

Addresses #131

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-22 09:11:20 -05:00
Michael A. Kuykendall
fb1cea186d Merge branch 'fix/issue-132-auto-stop-tokens' into main
Auto-configure stop tokens for chat templates to prevent template marker leakage.
Fixes Issue #132 reported by @3588.
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 21:07:29 -05:00
Michael A. Kuykendall
a9009ac05a fix: auto-configure stop tokens for chat templates (Issue #132)
Problem:
- Models using chat templates (ChatML, Llama3, etc.) were outputting raw template tokens
- User @3588 reported gpt-oss-20b outputting <|im_end|> and <|im_start|> in responses
- Templates were applied to format prompts but stop tokens were not configured
- Generated text leaked template markers instead of stopping at them

Solution:
- Added stop_tokens field to GenOptions struct (Vec<String>)
- Added stop_tokens() method to TemplateFamily enum:
  * ChatML → ["<|im_end|>", "<|im_start|>"]
  * Llama3 → ["<|eot_id|>", "<|end_of_text|>"]
  * OpenChat → [] (no special tokens)
- Modified OpenAI compatibility layer to auto-configure stop tokens based on template
- Added support for user-provided stop tokens that merge with template defaults
- Updated llama.rs generation loop to check for stop tokens and truncate output
- Stop token detection uses rfind() to find last occurrence and truncates there

Testing:
- Added 14 regression tests in tests/regression/issue_132_auto_stop_tokens.rs
- Tests cover template stop token configuration, user-provided tokens, truncation logic
- All 104 regression tests passing
- Build:  Clippy:  (no warnings) Format: 

Closes #132

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 21:07:23 -05:00
Michael A. Kuykendall
65b084f8a7 style: apply cargo fmt
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 18:36:22 -05:00
Michael A. Kuykendall
08825fadf9 Merge fix/issue-129-precompiled-gpu-support: GPU support in release binaries
Validated with 4 new regression tests - all 88 tests passing.
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 18:15:14 -05:00
Michael A. Kuykendall
d59124dd60 fix: Issue #129 - Add GPU support to precompiled binaries
User Report: @D0wn10ad - Downloaded Windows release binary, but gpu-info
showed all GPU features disabled (CUDA/Vulkan/OpenCL all disabled).

Root Cause:
- Release workflow built binaries WITHOUT GPU features
- All platforms used default cargo build (CPU only)
- Users forced to compile from source to get GPU support

Fix:
- Windows builds: Added llama-vulkan for broad GPU compatibility
- macOS builds: Added mlx for Apple Silicon GPU acceleration
- Linux musl: Kept huggingface-only (avoids llama.cpp C++ issues)
- Platform-specific feature detection in release workflow

Testing:
- Created tests/regression/issue_129_precompiled_gpu_support.rs
- Validates release workflow YAML contains GPU features
- Tests platform-specific conditional logic
- All 88 regression tests passing

Impact:
- Future releases will have GPU support built-in for Windows/macOS
- Users can download and use GPU acceleration without compiling
- Closes gap between source code capabilities and distributed binaries

Closes #129

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 18:15:06 -05:00
Michael A. Kuykendall
3b21c577b8 test: add regression test for Issue #130 GPU layer offloading
- Created tests/regression/issue_130_gpu_layer_offloading.rs
- Validates CPU backend returns 0 layers (no offload)
- Validates GPU backends (CUDA/Vulkan/OpenCL) return 999 layers (full offload)
- Made GpuBackend and gpu_layers() public for testing
- All 84 regression tests passing

This prevents regression of the fix where gpu_layers() was incorrectly
returning 999 for ALL backends including CPU, causing layers to not
properly offload to GPU even when compiled with GPU features.

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 17:55:43 -05:00
Michael A. Kuykendall
f71abc1d9e Merge fix/issue-130-gpu-layer-offloading: Fix GPU layer offloading
Validated on RTX 3060 - all 33 layers correctly assigned to CUDA device.
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 17:32:02 -05:00
Michael A. Kuykendall
5cc26d6bfa fix: GPU layer offloading not working (Issues #130, #126, #129)
The gpu_layers() method was returning 999 (offload all layers) for ALL
backends including CPU. This caused confusion where Vulkan would be
detected but layers would still be assigned to CPU in llama.cpp.

Root cause: GpuBackend::gpu_layers() didn't check which backend was active.

Changes:
- Match on backend type: CPU returns 0, GPU backends return 999
- Removed incorrect #[allow(dead_code)] from actively used methods
- new_with_backend(), with_moe_config(), get_backend_info() all in use

This fixes:
- Issue #130: Vulkan compiled but layers not offloaded
- Issue #126: MoE models not detecting GPU (same root cause)
- Issue #129: Partial fix - still need release workflow update

Users building with --features llama-vulkan will now actually get
GPU layer offloading. Layers will show as assigned to VULKAN device
instead of CPU in llama.cpp logs.

Testing:
- cargo build --features llama (compiles clean)
- cargo clippy --all-features (no warnings)
- GPU backends properly return 999 layers
- CPU backend properly returns 0 layers

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 17:05:44 -05:00
Michael A. Kuykendall
3ce5a116eb chore: remove MLX placeholder test - feature postponed
MLX support is waiting for new library implementation.
Issue #127 will remain open but feature is on hold.
Issue #128 (BackendAlreadyInitialized) was unrelated to MLX - Windows-only backend singleton issue already fixed.

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 16:22:00 -05:00
Michael A. Kuykendall
d3f82321c7 Merge fix/issue-128-backend-already-initialized: Backend singleton + clippy fixes
- Resolves Issue #128: BackendAlreadyInitialized error
- OnceLock singleton pattern for LlamaBackend
- Fixed 60+ clippy errors across codebase:
  * Boolean logic bugs in GPU detection tests
  * Obsolete API usage in benchmarks (Registry rewrite)
  * Dead code warnings in test helpers
  * Needless borrows (54 instances)
  * Unnecessary literal unwrap patterns
  * Field reassignment after default
  * expect_fun_call patterns
  * empty_line_after_doc_comments
  * single_match patterns
  * needless_range_loop
- All tests passing with -D warnings (strict mode)

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 15:25:40 -05:00
Michael A. Kuykendall
b0a9d2b048 style: apply cargo fmt after Issue #128 merge
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 13:52:10 -05:00
Michael A. Kuykendall
d232c99701 Merge fix/issue-128: Fix BackendAlreadyInitialized error
- Implements OnceLock singleton for LlamaBackend initialization
- Ensures backend is initialized only once per process
- Removes owned backend from LlamaLoaded struct
- Adds regression test for backend reinitialization
- Fixes #128

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 13:42:35 -05:00
Michael A. Kuykendall
4f0e16849a ci: temporarily disable MLX workflow
- MLX is being replaced with new library
- Changed to workflow_dispatch (manual trigger only)
- Will re-enable after new library integration is complete

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 12:58:33 -05:00
Michael A. Kuykendall
c9c13349d4 style: apply cargo fmt to fix CI formatting checks
- Fixed formatting in src/openai_compat.rs
- Fixed formatting in regression test files
- Ensures CI Code Quality checks pass

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 12:13:35 -05:00
Michael A. Kuykendall
a6c69ab637 Merge feat/regression-test-infrastructure: Comprehensive regression test infrastructure
- 82 regression tests organized into tests/regression/ structure
- Simplified CI/CD integration for regression testing
- Auto-discovery script for individual test execution
- Fixed gitignore to exclude console/ and **/target/ directories
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 12:04:06 -05:00
Michael A. Kuykendall
5cafc36839 chore(gitignore): exclude console development subproject
- Adds console/ directory exclusion
- Prevents console/target/ build artifacts from appearing as tracked files
- Complements existing **/target/ pattern

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 12:02:37 -05:00
Michael A. Kuykendall
ae286d4467 refactor(ci): simplify regression test execution
- Remove redundant regression-tests job (tests already run in main suite)
- Add explicit regression test step to main test job
- Runs cargo test --test regression (82 tests covering Issues #12-#128)
- Runs cargo test --test regression_tests (14 additional tests)
- Regression tests now part of normal CI flow, not separate script
- Cleaner workflow, same coverage, zero tolerance for regressions

Previously had dedicated job that ran scripts/run-regression-tests-auto.sh.
Now regression tests run alongside unit/integration tests for efficiency.

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 10:19:29 -05:00
Michael A. Kuykendall
04fafb3612 feat(tests): comprehensive regression test infrastructure with CI/CD integration
- Organized 20+ regression test files into tests/regression/ directory
- Each test file covers specific GitHub issues to prevent regressions
- Created tests/regression.rs to include all organized test modules
- Added automated test runner: scripts/run-regression-tests-auto.sh
- Updated .github/workflows/ci.yml to run regression tests before main suite
- Fixed .gitignore: **/target/ pattern + console/ directory exclusion
- Moved old scattered test files into organized structure (git detected renames)
- 82 regression tests now auto-discovered and run in CI/CD

**Zero Tolerance Policy**: All regression tests MUST pass before PR/release.

Tests cover Issues: #12, #13, #51, #53, #63, #64, #68, #72, #101, #106, #108,
#110, #111, #112, #113, #114, #127, #128 + general packaging/versioning

**Testing:**
- cargo test --test regression --features llama: 82 passed
- CI/CD integration: Runs automatically on every PR
- Release gates: Blocks releases if any regression fails

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-21 10:09:03 -05:00
Michael A. Kuykendall
fa15500ffd fix(llama): resolve Issue #128 BackendAlreadyInitialized error\n\n- Root cause: LlamaBackend::init() called on every model load\n- Solution: Global OnceLock singleton ensures single initialization\n- Backend now shared across all model loads per process\n- Removed owned backend from LlamaLoaded struct\n- Added regression test: tests/regression/issue_128_backend_reinitialization.rs\n- Fixed .gitignore to properly exclude **/target/ directories\n\nFixes #128
Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-20 18:28:23 -05:00
Michael A. Kuykendall
cd73bcea0d fix(tests): complete Model struct initialization for issue #113
- Add missing optional fields (permission, root, parent) to all Model test initializations
- Fixes compilation errors in test_models_response_structure and test_openai_response_structures
- All 24 openai_compat tests now passing
- Ensures regression test for Open WebUI/AnythingLLM compatibility works correctly

Completes fix for Issue #113

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-19 13:50:58 -05:00
Mike Kuykendall
37f67acbbf fix(api): enhance OpenAI API compatibility for frontend integration (Issue #113) (#123)
- Add optional OpenAI-standard fields to Model struct: permission, root, parent
- Use skip_serializing_if to omit null fields for cleaner JSON responses
- Populate root field with model name for standard OpenAI compatibility
- Update regression tests to verify enhanced Model structure
- Improves compatibility with frontend tools that expect full OpenAI API format

Resolves Issue #113: OpenAI API compatibility for frontends

Signed-off-by: Michael A. Kuykendall <michaelallenkuykendall@gmail.com>
2025-10-13 10:29:39 -05:00