Merge pull request #102 from HKUDS/codex/ollama-harness

feat: add ollama agent harness
2026-04-30 22:02:01 +08:00 · 2026-03-19 23:47:11 +08:00
parent d6040fb59f b601e69a38
commit d2dd36c831
22 changed files with 3056 additions and 10 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -39,6 +39,7 @@
 !/drawio/
 !/mermaid/
 !/adguardhome/
+!/ollama/

 # Step 5: Inside each software dir, ignore everything (including dotfiles)
 /gimp/*
@@ -69,6 +70,8 @@
 /mermaid/.*
 /adguardhome/*
 /adguardhome/.*
+/ollama/*
+/ollama/.*

 # Step 6: ...except agent-harness/
 !/gimp/agent-harness/
@@ -85,6 +88,7 @@
 !/drawio/agent-harness/
 !/mermaid/agent-harness/
 !/adguardhome/agent-harness/
+!/ollama/agent-harness/

 # Step 7: Ignore build artifacts within allowed dirs
 **/__pycache__/
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ CLI-Anything: Bridging the Gap Between AI Agents and the World's Software</stron
  <a href="#-quick-start"><img src="https://img.shields.io/badge/Quick_Start-5_min-blue?style=for-the-badge" alt="Quick Start"></a>
  <a href="https://hkuds.github.io/CLI-Anything/hub/"><img src="https://img.shields.io/badge/CLI_Hub-Browse_%26_Install-ff69b4?style=for-the-badge" alt="CLI Hub"></a>
  <a href="#-demonstrations"><img src="https://img.shields.io/badge/Demos-16_Apps-green?style=for-the-badge" alt="Demos"></a>
-  <a href="#-test-results"><img src="https://img.shields.io/badge/Tests-1%2C720_Passing-brightgreen?style=for-the-badge" alt="Tests"></a>
+  <a href="#-test-results"><img src="https://img.shields.io/badge/Tests-1%2C839_Passing-brightgreen?style=for-the-badge" alt="Tests"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge" alt="License"></a>
 </p>

@@ -375,7 +375,7 @@ Each installed CLI ships with a [`SKILL.md`](#-skillmd-generation) inside the Py
 | Category | How to be Agent-native | Notable Examples |
 |----------|----------------------|----------|
 | **📂 GitHub Repositories** | Transform any open-source project into agent-controllable tools through automatic CLI generation | VSCodium, WordPress, Calibre, Zotero, Joplin, Logseq, Penpot, Super Productivity |
-| **🤖 AI/ML Platforms** | Automate model training, inference pipelines, and hyperparameter tuning through structured commands | Stable Diffusion WebUI, ComfyUI, InvokeAI, Text-generation-webui, Open WebUI, Fooocus, Kohya_ss, AnythingLLM, SillyTavern |
+| **🤖 AI/ML Platforms** | Automate model training, inference pipelines, and hyperparameter tuning through structured commands | Stable Diffusion WebUI, ComfyUI, Ollama, InvokeAI, Text-generation-webui, Open WebUI, Fooocus, Kohya_ss, AnythingLLM, SillyTavern |
 | **📊 Data & Analytics** | Enable programmatic data processing, visualization, and statistical analysis workflows | JupyterLab, Apache Superset, Metabase, Redash, DBeaver, KNIME, Orange, OpenSearch Dashboards, Lightdash |
 | **💻 Development Tools** | Streamline code editing, building, testing, and deployment processes via command interfaces | Jenkins, Gitea, Hoppscotch, Portainer, pgAdmin, SonarQube, ArgoCD, OpenLens, Insomnia, Beekeeper Studio |
 | **🎨 Creative & Media** | Control content creation, editing, and rendering workflows programmatically | Blender, GIMP, OBS Studio, Audacity, Krita, Kdenlive, Shotcut, Inkscape, Darktable, LMMS, Ardour |
@@ -401,7 +401,7 @@ AI agents are great at reasoning but terrible at using real professional softwar
 | 💸 "UI automation breaks constantly" | No screenshots, no clicking, no RPA fragility. Pure command-line reliability with structured interfaces |
 | 📊 "Agents need structured data" | Built-in JSON output for seamless agent consumption + human-readable formats for debugging |
 | 🔧 "Custom integrations are expensive" | One Claude plugin auto-generates CLIs for ANY codebase through proven 7-phase pipeline |
-| ⚡ "Prototype vs Production gap" | 1,720 tests with real software validation. Battle-tested across 16 major applications |
+| ⚡ "Prototype vs Production gap" | 1,839+ tests with real software validation. Battle-tested across 16 major applications |

 ---

@@ -502,7 +502,7 @@ SKILL.md files are auto-generated during Phase 6.5 of the pipeline using `skill_
 CLI-Anything works on any software with a codebase — no domain restrictions or architectural limitations.

 ### 🏭 Professional-Grade Testing
-Tested across 14 diverse, complex applications spanning creative, productivity, communication, diagramming, AI image generation, and AI content generation domains previously inaccessible to AI agents.
+Tested across 16 diverse, complex applications spanning creative, productivity, communication, diagramming, AI image generation, AI content generation, network ad blocking, and local LLM inference domains previously inaccessible to AI agents.

 ### 🎨 Diverse Domain Coverage
 From creative workflows (image editing, 3D modeling, vector graphics) to production tools (audio, office, live streaming, video editing).
@@ -610,6 +610,13 @@ Each application received complete, production-ready CLI interfaces — not demo
 <td align="center">✅ 50</td>
 </tr>
 <tr>
+<td align="center"><strong>🧠 NotebookLM</strong></td>
+<td>AI Research Assistant</td>
+<td><code>cli-anything-notebooklm</code></td>
+<td>NotebookLM CLI wrapper (experimental)</td>
+<td align="center">✅ 21</td>
+</tr>
+<tr>
 <td align="center"><strong>🖼️ ComfyUI</strong></td>
 <td>AI Image Generation</td>
 <td><code>cli-anything-comfyui</code></td>
@@ -617,12 +624,26 @@ Each application received complete, production-ready CLI interfaces — not demo
 <td align="center">✅ 70</td>
 </tr>
 <tr>
+<td align="center"><strong>🛡️ AdGuard Home</strong></td>
+<td>Network-wide Ad Blocking</td>
+<td><code>cli-anything-adguardhome</code></td>
+<td>AdGuard Home REST API</td>
+<td align="center">✅ 36</td>
+</tr>
+<tr>
+<td align="center"><strong>🦙 Ollama</strong></td>
+<td>Local LLM Inference</td>
+<td><code>cli-anything-ollama</code></td>
+<td>Ollama REST API</td>
+<td align="center">✅ 98</td>
+</tr>
+<tr>
 <td align="center" colspan="4"><strong>Total</strong></td>
-<td align="center"><strong>✅ 1,720</strong></td>
+<td align="center"><strong>✅ 1,839</strong></td>
 </tr>
 </table>

-> **100% pass rate** across all 1,720 tests — 1,247 unit tests + 473 end-to-end tests.
+> **100% pass rate** across all 1,839 tests — 1,355 unit tests + 484 end-to-end tests.

 ---

@@ -652,10 +673,12 @@ zoom           22 passed  ✅   (22 unit + 0 e2e)
 drawio        138 passed  ✅   (116 unit + 22 e2e)
 mermaid        10 passed  ✅   (5 unit + 5 e2e)
 anygen         50 passed  ✅   (40 unit + 10 e2e)
+notebooklm     21 passed  ✅   (21 unit + 0 e2e)
 comfyui        70 passed  ✅   (60 unit + 10 e2e)
 adguardhome    36 passed  ✅   (24 unit + 12 e2e)
+ollama         98 passed  ✅   (87 unit + 11 e2e)
 ──────────────────────────────────────────────────────────────────────────────
-TOTAL        1,720 passed  ✅   100% pass rate
+TOTAL        1,839 passed  ✅   100% pass rate
 ```

 ---
@@ -720,7 +743,8 @@ cli-anything/
 ├── 🖼️ comfyui/agent-harness/            # ComfyUI CLI (70 tests)
 ├── 🧠 notebooklm/agent-harness/         # NotebookLM CLI (experimental, 21 tests)
 ├── 🖼️ comfyui/agent-harness/            # ComfyUI CLI (70 tests)
-└── 🛡️ adguardhome/agent-harness/        # AdGuardHome CLI (36 tests)
+├── 🛡️ adguardhome/agent-harness/       # AdGuard Home CLI (36 tests)
+└── 🦙 ollama/agent-harness/             # Ollama CLI (98 tests)
 ```

 Each `agent-harness/` contains an installable Python package under `cli_anything.<software>/` with Click CLI, core modules, utils (including `repl_skin.py` and backend wrapper), and comprehensive tests.
@@ -821,7 +845,7 @@ HARNESS.md is our definitive SOP for making any software agent-accessible via au

 It encodes proven patterns and methodologies refined through automated generation processes.

-The playbook distills key insights from successfully building all 14 diverse, production-ready harnesses.
+The playbook distills key insights from successfully building all 16 diverse, production-ready harnesses.

 ### Critical Lessons

@@ -946,7 +970,7 @@ MIT License — free to use, modify, and distribute.

 **CLI-Anything** — *Make any software with a codebase Agent-native.*

-<sub>A methodology for the age of AI agents | 16 professional software demos | 1,720 passing tests</sub>
+<sub>A methodology for the age of AI agents | 16 professional software demos | 1,839 passing tests</sub>

 <br>

--- a/ollama/agent-harness/OLLAMA.md
+++ b/ollama/agent-harness/OLLAMA.md
@@ -0,0 +1,99 @@
+# Ollama: Project-Specific Analysis & SOP
+
+## Architecture Summary
+
+Ollama is a local LLM runtime that serves models via a REST API on `localhost:11434`.
+It handles model downloading, quantization, GPU/CPU inference, and memory management.
+
+```
+┌──────────────────────────────────────────────┐
+│              Ollama Server                   │
+│  ┌──────────┐ ┌──────────┐ ┌─────────────┐  │
+│  │  Model    │ │ Generate │ │  Embeddings │  │
+│  │  Manager  │ │  Engine  │ │   Engine    │  │
+│  └────┬──────┘ └────┬─────┘ └──────┬──────┘  │
+│       │             │              │          │
+│  ┌────┴─────────────┴──────────────┴───────┐ │
+│  │         REST API (port 11434)           │ │
+│  │  /api/tags  /api/generate  /api/embed   │ │
+│  │  /api/pull  /api/chat      /api/show    │ │
+│  │  /api/delete /api/copy     /api/ps      │ │
+│  └─────────────────┬───────────────────────┘ │
+└────────────────────┼─────────────────────────┘
+                     │
+         ┌───────────┴──────────┐
+         │  llama.cpp backend   │
+         │  GGUF model format   │
+         │  GPU/CPU inference   │
+         └──────────────────────┘
+```
+
+## CLI Strategy: REST API Wrapper
+
+Ollama already provides a clean REST API. Our CLI wraps it with:
+
+1. **requests** — HTTP client for all API calls
+2. **Streaming NDJSON** — For progressive output during generation and model pulls
+3. **Click CLI** — Structured command groups matching the API surface
+4. **REPL** — Interactive mode for exploratory use
+
+### API Endpoints
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/` | GET | Server status check |
+| `/api/tags` | GET | List local models |
+| `/api/show` | POST | Model details |
+| `/api/pull` | POST | Download model (streaming) |
+| `/api/delete` | DELETE | Remove model |
+| `/api/copy` | POST | Copy/rename model |
+| `/api/ps` | GET | Running models |
+| `/api/generate` | POST | Text generation (streaming) |
+| `/api/chat` | POST | Chat completion (streaming) |
+| `/api/embed` | POST | Generate embeddings |
+| `/api/version` | GET | Server version |
+
+## Command Map: Ollama Native CLI → CLI-Anything
+
+| Ollama CLI | CLI-Anything |
+|-----------|-------------|
+| `ollama list` | `model list` |
+| `ollama show <name>` | `model show <name>` |
+| `ollama pull <name>` | `model pull <name>` |
+| `ollama rm <name>` | `model rm <name>` |
+| `ollama cp <src> <dst>` | `model copy <src> <dst>` |
+| `ollama ps` | `model ps` |
+| `ollama run <model> <prompt>` | `generate text --model <name> --prompt "..."` |
+| (no equivalent) | `generate chat --model <name> --message "..."` |
+| (no equivalent) | `embed text --model <name> --input "..."` |
+| `ollama serve` | (external — must be running) |
+
+## Model Parameters (options)
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `temperature` | float | Sampling temperature (0.0-2.0) |
+| `top_p` | float | Nucleus sampling threshold |
+| `top_k` | int | Top-k sampling |
+| `num_predict` | int | Max tokens to generate |
+| `repeat_penalty` | float | Repetition penalty |
+| `seed` | int | Random seed for reproducibility |
+| `stop` | list[str] | Stop sequences |
+
+## Test Coverage Plan
+
+1. **Unit tests** (`test_core.py`): No Ollama server needed
+   - URL construction in backend
+   - Output formatting
+   - CLI argument parsing via Click test runner
+   - Session state management
+   - Error handling paths
+
+2. **E2E tests** (`test_full_e2e.py`): Requires Ollama running
+   - List models
+   - Pull a small model
+   - Generate text
+   - Chat completion
+   - Show model info
+   - Embeddings
+   - Delete model
--- a/ollama/agent-harness/cli_anything/ollama/README.md
+++ b/ollama/agent-harness/cli_anything/ollama/README.md
@@ -0,0 +1,180 @@
+# Ollama CLI
+
+A command-line interface for local LLM inference and model management via the Ollama REST API.
+Designed for AI agents and power users who need to manage models, generate text, and chat without a GUI.
+
+## Prerequisites
+
+- Python 3.10+
+- [Ollama](https://ollama.com) installed and running (`ollama serve`)
+- `click` (CLI framework)
+- `requests` (HTTP client)
+
+Optional (for interactive REPL):
+- `prompt_toolkit`
+
+## Install Dependencies
+
+```bash
+pip install click requests prompt_toolkit
+```
+
+## How to Run
+
+All commands are run from the `agent-harness/` directory, or via the installed entry point.
+
+### One-shot commands
+
+```bash
+# Show help
+cli-anything-ollama --help
+
+# List models
+cli-anything-ollama model list
+
+# Pull a model
+cli-anything-ollama model pull llama3.2
+
+# Generate text
+cli-anything-ollama generate text --model llama3.2 --prompt "Explain quantum computing"
+
+# Chat
+cli-anything-ollama generate chat --model llama3.2 --message "user:Hello!"
+
+# JSON output (for agent consumption)
+cli-anything-ollama --json server status
+```
+
+### Interactive REPL
+
+```bash
+cli-anything-ollama
+# Enter commands interactively with tab-completion and history
+```
+
+Inside the REPL, type `help` for all available commands.
+
+## Command Reference
+
+### Model
+
+```bash
+model list                              # List locally available models
+model show <name>                       # Show model details
+model pull <name> [--no-stream]         # Download a model
+model rm <name>                         # Delete a model
+model copy <source> <destination>       # Copy a model
+model ps                                # List currently loaded models
+```
+
+### Generate
+
+```bash
+generate text --model <name> --prompt "..." [--system "..."] [--no-stream]
+              [--temperature 0.7] [--top-p 0.9] [--num-predict 256]
+
+generate chat --model <name> --message "user:Hello" [--message "assistant:Hi"]
+              [--file messages.json] [--no-stream] [--continue-chat]
+```
+
+### Embed
+
+```bash
+embed text --model <name> --input "Text to embed"
+```
+
+### Server
+
+```bash
+server status                           # Check if Ollama is running
+server version                          # Show Ollama version
+```
+
+### Session
+
+```bash
+session status                          # Show session state
+session history                         # Show chat history
+```
+
+## JSON Mode
+
+Add `--json` before the subcommand for machine-readable output:
+
+```bash
+cli-anything-ollama --json model list
+cli-anything-ollama --json generate text --model llama3.2 --prompt "Hello"
+```
+
+## Custom Host
+
+Connect to a remote Ollama instance:
+
+```bash
+cli-anything-ollama --host http://192.168.1.100:11434 model list
+```
+
+## Example Workflow
+
+```bash
+# Check server
+cli-anything-ollama server status
+
+# Pull a model
+cli-anything-ollama model pull llama3.2
+
+# Generate text
+cli-anything-ollama generate text --model llama3.2 --prompt "Write a haiku about coding"
+
+# Multi-turn chat
+cli-anything-ollama generate chat --model llama3.2 \
+  --message "user:What is Python?" \
+  --message "user:How does it compare to JavaScript?"
+
+# Generate embeddings
+cli-anything-ollama embed text --model nomic-embed-text --input "Hello world"
+
+# Check loaded models
+cli-anything-ollama model ps
+
+# Clean up
+cli-anything-ollama model rm llama3.2
+```
+
+## Output Formats
+
+All commands support dual output modes:
+
+- **Human-readable** (default): Tables, colors, formatted text
+- **Machine-readable** (`--json` flag): Structured JSON for agent consumption
+
+```bash
+# Human output
+cli-anything-ollama model list
+
+# JSON output for agents
+cli-anything-ollama --json model list
+```
+
+## For AI Agents
+
+When using this CLI programmatically:
+
+1. **Always use `--json` flag** for parseable output
+2. **Check return codes** - 0 for success, non-zero for errors
+3. **Parse stderr** for error messages on failure
+4. **Use `--no-stream`** for generate/chat to get complete responses
+5. **Verify Ollama is running** with `server status` before other commands
+
+## Running Tests
+
+```bash
+cd agent-harness
+python -m pytest cli_anything/ollama/tests/test_core.py -v        # Unit tests (no Ollama needed)
+python -m pytest cli_anything/ollama/tests/test_full_e2e.py -v    # E2E tests (requires Ollama)
+python -m pytest cli_anything/ollama/tests/ -v                     # All tests
+```
+
+## Version
+
+1.0.1
--- a/ollama/agent-harness/cli_anything/ollama/init.py
+++ b/ollama/agent-harness/cli_anything/ollama/init.py
@@ -0,0 +1 @@
+"""Ollama CLI - Local LLM inference and model management."""
--- a/ollama/agent-harness/cli_anything/ollama/main.py
+++ b/ollama/agent-harness/cli_anything/ollama/main.py
@@ -0,0 +1,3 @@
+"""Allow running as python -m cli_anything.ollama"""
+from cli_anything.ollama.ollama_cli import main
+main()
--- a/ollama/agent-harness/cli_anything/ollama/core/init.py
+++ b/ollama/agent-harness/cli_anything/ollama/core/init.py
--- a/ollama/agent-harness/cli_anything/ollama/core/embeddings.py
+++ b/ollama/agent-harness/cli_anything/ollama/core/embeddings.py
@@ -0,0 +1,18 @@
+"""Ollama embeddings — generate vector embeddings from text."""
+
+from cli_anything.ollama.utils.ollama_backend import api_post
+
+
+def embed(base_url: str, model: str, input_text: str | list[str]) -> dict:
+    """Generate embeddings for input text.
+
+    Args:
+        base_url: Ollama server URL.
+        model: Model name (must support embeddings, e.g., 'nomic-embed-text').
+        input_text: Text string or list of strings to embed.
+
+    Returns:
+        Dict with 'embeddings' key containing list of embedding vectors.
+    """
+    data = {"model": model, "input": input_text}
+    return api_post(base_url, "/api/embed", data, timeout=60)
--- a/ollama/agent-harness/cli_anything/ollama/core/generate.py
+++ b/ollama/agent-harness/cli_anything/ollama/core/generate.py
@@ -0,0 +1,88 @@
+"""Ollama text generation and chat — streaming and non-streaming inference."""
+
+import sys
+from cli_anything.ollama.utils.ollama_backend import api_post, api_post_stream
+
+
+def generate(base_url: str, model: str, prompt: str,
+             system: str | None = None, template: str | None = None,
+             context: list | None = None, options: dict | None = None,
+             stream: bool = True):
+    """Generate a text completion.
+
+    Args:
+        base_url: Ollama server URL.
+        model: Model name.
+        prompt: Input prompt.
+        system: Optional system message.
+        template: Optional prompt template override.
+        context: Optional context from previous generate call.
+        options: Optional model parameters (temperature, top_p, etc.).
+        stream: If True, yields response chunks. If False, returns complete response.
+
+    Returns/Yields:
+        Response dicts with 'response' text and metadata.
+    """
+    data = {"model": model, "prompt": prompt, "stream": stream}
+    if system is not None:
+        data["system"] = system
+    if template is not None:
+        data["template"] = template
+    if context is not None:
+        data["context"] = context
+    if options is not None:
+        data["options"] = options
+
+    if stream:
+        return api_post_stream(base_url, "/api/generate", data)
+    else:
+        return api_post(base_url, "/api/generate", data, timeout=300)
+
+
+def chat(base_url: str, model: str, messages: list[dict],
+         options: dict | None = None, stream: bool = True):
+    """Send a chat completion request.
+
+    Args:
+        base_url: Ollama server URL.
+        model: Model name.
+        messages: List of message dicts with 'role' and 'content' keys.
+        options: Optional model parameters.
+        stream: If True, yields response chunks. If False, returns complete response.
+
+    Returns/Yields:
+        Response dicts with 'message' containing assistant reply.
+    """
+    data = {"model": model, "messages": messages, "stream": stream}
+    if options is not None:
+        data["options"] = options
+
+    if stream:
+        return api_post_stream(base_url, "/api/chat", data)
+    else:
+        return api_post(base_url, "/api/chat", data, timeout=300)
+
+
+def stream_to_stdout(chunks) -> dict:
+    """Print streaming tokens to stdout and return the final response.
+
+    Args:
+        chunks: Generator of response chunks from generate() or chat().
+
+    Returns:
+        The final chunk (contains metadata like total_duration, etc.).
+    """
+    final = {}
+    for chunk in chunks:
+        # generate endpoint uses 'response', chat endpoint uses 'message.content'
+        if "response" in chunk:
+            sys.stdout.write(chunk["response"])
+            sys.stdout.flush()
+        elif "message" in chunk and "content" in chunk["message"]:
+            sys.stdout.write(chunk["message"]["content"])
+            sys.stdout.flush()
+        if chunk.get("done", False):
+            final = chunk
+    sys.stdout.write("\n")
+    sys.stdout.flush()
+    return final
--- a/ollama/agent-harness/cli_anything/ollama/core/models.py
+++ b/ollama/agent-harness/cli_anything/ollama/core/models.py
@@ -0,0 +1,80 @@
+"""Ollama model management — list, pull, show, delete, copy, running models."""
+
+from cli_anything.ollama.utils.ollama_backend import (
+    api_get, api_post, api_post_stream, api_delete,
+)
+
+
+def list_models(base_url: str) -> dict:
+    """List all locally available models.
+
+    Returns:
+        Dict with 'models' key containing list of model info dicts.
+    """
+    return api_get(base_url, "/api/tags")
+
+
+def show_model(base_url: str, name: str) -> dict:
+    """Show details about a model (parameters, template, license, etc.).
+
+    Args:
+        name: Model name (e.g., 'llama3.2', 'mistral:latest').
+
+    Returns:
+        Dict with model details.
+    """
+    return api_post(base_url, "/api/show", {"name": name})
+
+
+def pull_model(base_url: str, name: str, stream: bool = True):
+    """Download a model from the Ollama library.
+
+    Args:
+        name: Model name to pull.
+        stream: If True, yields progress dicts. If False, returns final result.
+
+    Returns/Yields:
+        Progress dicts with 'status', 'digest', 'total', 'completed' keys.
+    """
+    data = {"name": name, "stream": stream}
+    if stream:
+        return api_post_stream(base_url, "/api/pull", data)
+    else:
+        return api_post(base_url, "/api/pull", data, timeout=600)
+
+
+def delete_model(base_url: str, name: str) -> dict:
+    """Delete a model from local storage.
+
+    Args:
+        name: Model name to delete.
+
+    Returns:
+        Status dict.
+    """
+    return api_delete(base_url, "/api/delete", {"name": name})
+
+
+def copy_model(base_url: str, source: str, destination: str) -> dict:
+    """Copy a model to a new name.
+
+    Args:
+        source: Source model name.
+        destination: New model name.
+
+    Returns:
+        Status dict.
+    """
+    return api_post(base_url, "/api/copy", {
+        "source": source,
+        "destination": destination,
+    })
+
+
+def running_models(base_url: str) -> dict:
+    """List models currently loaded in memory.
+
+    Returns:
+        Dict with 'models' key containing currently running model info.
+    """
+    return api_get(base_url, "/api/ps")
--- a/ollama/agent-harness/cli_anything/ollama/core/server.py
+++ b/ollama/agent-harness/cli_anything/ollama/core/server.py
@@ -0,0 +1,21 @@
+"""Ollama server info — status, version, running models."""
+
+from cli_anything.ollama.utils.ollama_backend import api_get
+
+
+def server_status(base_url: str) -> dict:
+    """Check if Ollama server is running.
+
+    Returns:
+        Dict with server status message.
+    """
+    return api_get(base_url, "/")
+
+
+def version(base_url: str) -> dict:
+    """Get Ollama server version.
+
+    Returns:
+        Dict with 'version' key.
+    """
+    return api_get(base_url, "/api/version")
--- a/ollama/agent-harness/cli_anything/ollama/ollama_cli.py
+++ b/ollama/agent-harness/cli_anything/ollama/ollama_cli.py
@@ -0,0 +1,501 @@
+#!/usr/bin/env python3
+"""Ollama CLI — A command-line interface for local LLM inference and model management.
+
+This CLI provides full access to the Ollama REST API for managing models,
+generating text, chatting, and creating embeddings.
+
+Usage:
+    # One-shot commands
+    cli-anything-ollama model list
+    cli-anything-ollama generate text --model llama3.2 --prompt "Hello"
+    cli-anything-ollama --json server status
+
+    # Interactive REPL
+    cli-anything-ollama
+"""
+
+import sys
+import os
+import json
+import click
+from typing import Optional
+
+from cli_anything.ollama.utils.ollama_backend import DEFAULT_BASE_URL
+from cli_anything.ollama.core import models as models_mod
+from cli_anything.ollama.core import generate as gen_mod
+from cli_anything.ollama.core import embeddings as embed_mod
+from cli_anything.ollama.core import server as server_mod
+
+# Global state
+_json_output = False
+_repl_mode = False
+_host = DEFAULT_BASE_URL
+_chat_history: list[dict] = []
+_last_model: str = ""
+
+
+def output(data, message: str = ""):
+    if _json_output:
+        click.echo(json.dumps(data, indent=2, default=str))
+    else:
+        if message:
+            click.echo(message)
+        if isinstance(data, dict):
+            _print_dict(data)
+        elif isinstance(data, list):
+            _print_list(data)
+        else:
+            click.echo(str(data))
+
+
+def _print_dict(d: dict, indent: int = 0):
+    prefix = "  " * indent
+    for k, v in d.items():
+        if isinstance(v, dict):
+            click.echo(f"{prefix}{k}:")
+            _print_dict(v, indent + 1)
+        elif isinstance(v, list):
+            click.echo(f"{prefix}{k}:")
+            _print_list(v, indent + 1)
+        else:
+            click.echo(f"{prefix}{k}: {v}")
+
+
+def _print_list(items: list, indent: int = 0):
+    prefix = "  " * indent
+    for i, item in enumerate(items):
+        if isinstance(item, dict):
+            click.echo(f"{prefix}[{i}]")
+            _print_dict(item, indent + 1)
+        else:
+            click.echo(f"{prefix}- {item}")
+
+
+def handle_error(func):
+    def wrapper(*args, **kwargs):
+        try:
+            return func(*args, **kwargs)
+        except RuntimeError as e:
+            if _json_output:
+                click.echo(json.dumps({"error": str(e), "type": "runtime_error"}))
+            else:
+                click.echo(f"Error: {e}", err=True)
+            if not _repl_mode:
+                sys.exit(1)
+        except (ValueError, IndexError) as e:
+            if _json_output:
+                click.echo(json.dumps({"error": str(e), "type": type(e).__name__}))
+            else:
+                click.echo(f"Error: {e}", err=True)
+            if not _repl_mode:
+                sys.exit(1)
+    wrapper.__name__ = func.__name__
+    wrapper.__doc__ = func.__doc__
+    return wrapper
+
+
+# ── Main CLI Group ──────────────────────────────────────────────
+@click.group(invoke_without_command=True)
+@click.option("--json", "use_json", is_flag=True, help="Output as JSON")
+@click.option("--host", type=str, default=None,
+              help=f"Ollama server URL (default: {DEFAULT_BASE_URL})")
+@click.pass_context
+def cli(ctx, use_json, host):
+    """Ollama CLI — Local LLM inference and model management.
+
+    Run without a subcommand to enter interactive REPL mode.
+    """
+    global _json_output, _host
+    _json_output = use_json
+    if host:
+        _host = host
+
+    if ctx.invoked_subcommand is None:
+        ctx.invoke(repl)
+
+
+# ── Model Commands ───────────────────────────────────────────────
+@cli.group()
+def model():
+    """Model management commands."""
+    pass
+
+
+@model.command("list")
+@handle_error
+def model_list():
+    """List locally available models."""
+    result = models_mod.list_models(_host)
+    models = result.get("models", [])
+    if _json_output:
+        output(result)
+    else:
+        if not models:
+            click.echo("No models installed. Pull one with: model pull <name>")
+            return
+        click.echo(f"{'NAME':<40} {'SIZE':<12} {'MODIFIED'}")
+        click.echo("─" * 70)
+        for m in models:
+            name = m.get("name", "")
+            size = m.get("size", 0)
+            modified = m.get("modified_at", "")[:19]
+            size_str = _format_size(size)
+            click.echo(f"{name:<40} {size_str:<12} {modified}")
+
+
+@model.command("show")
+@click.argument("name")
+@handle_error
+def model_show(name):
+    """Show model details (parameters, template, license)."""
+    result = models_mod.show_model(_host, name)
+    output(result, f"Model: {name}")
+
+
+@model.command("pull")
+@click.argument("name")
+@click.option("--no-stream", is_flag=True, help="Wait for completion without progress")
+@handle_error
+def model_pull(name, no_stream):
+    """Download a model from the Ollama library."""
+    if no_stream or _json_output:
+        result = models_mod.pull_model(_host, name, stream=False)
+        output(result, f"Pulled: {name}")
+    else:
+        click.echo(f"Pulling {name}...")
+        last_status = ""
+        for chunk in models_mod.pull_model(_host, name, stream=True):
+            if "error" in chunk:
+                raise RuntimeError(chunk["error"])
+            status = chunk.get("status", "")
+            if status != last_status:
+                click.echo(f"  {status}")
+                last_status = status
+            completed = chunk.get("completed", 0)
+            total = chunk.get("total", 0)
+            if total > 0:
+                pct = int(completed / total * 100)
+                bar_w = 30
+                filled = int(bar_w * completed / total)
+                bar = "█" * filled + "░" * (bar_w - filled)
+                click.echo(f"\r  {bar} {pct:3d}% ({_format_size(completed)}/{_format_size(total)})", nl=False)
+        click.echo(f"\nDone: {name}")
+
+
+@model.command("rm")
+@click.argument("name")
+@handle_error
+def model_rm(name):
+    """Delete a model from local storage."""
+    result = models_mod.delete_model(_host, name)
+    output(result, f"Deleted: {name}")
+
+
+@model.command("copy")
+@click.argument("source")
+@click.argument("destination")
+@handle_error
+def model_copy(source, destination):
+    """Copy a model to a new name."""
+    result = models_mod.copy_model(_host, source, destination)
+    output(result, f"Copied {source} → {destination}")
+
+
+@model.command("ps")
+@handle_error
+def model_ps():
+    """List models currently loaded in memory."""
+    result = models_mod.running_models(_host)
+    models = result.get("models", [])
+    if _json_output:
+        output(result)
+    else:
+        if not models:
+            click.echo("No models currently loaded.")
+            return
+        click.echo(f"{'NAME':<40} {'SIZE':<12} {'PROCESSOR':<15} {'UNTIL'}")
+        click.echo("─" * 80)
+        for m in models:
+            name = m.get("name", "")
+            size = m.get("size", 0)
+            proc = m.get("size_vram", 0)
+            until = m.get("expires_at", "")[:19]
+            click.echo(f"{name:<40} {_format_size(size):<12} {_format_size(proc):<15} {until}")
+
+
+# ── Generate Commands ────────────────────────────────────────────
+@cli.group()
+def generate():
+    """Text generation and chat commands."""
+    pass
+
+
+@generate.command("text")
+@click.option("--model", "-m", "model_name", required=True, help="Model name")
+@click.option("--prompt", "-p", required=True, help="Input prompt")
+@click.option("--system", "-s", default=None, help="System message")
+@click.option("--no-stream", is_flag=True, help="Return complete response instead of streaming")
+@click.option("--temperature", type=float, default=None, help="Sampling temperature")
+@click.option("--top-p", type=float, default=None, help="Top-p sampling")
+@click.option("--num-predict", type=int, default=None, help="Max tokens to generate")
+@handle_error
+def generate_text(model_name, prompt, system, no_stream, temperature, top_p, num_predict):
+    """Generate text from a prompt."""
+    global _last_model
+    _last_model = model_name
+
+    options = {}
+    if temperature is not None:
+        options["temperature"] = temperature
+    if top_p is not None:
+        options["top_p"] = top_p
+    if num_predict is not None:
+        options["num_predict"] = num_predict
+
+    if no_stream or _json_output:
+        result = gen_mod.generate(
+            _host, model_name, prompt, system=system,
+            options=options or None, stream=False,
+        )
+        output(result)
+    else:
+        chunks = gen_mod.generate(
+            _host, model_name, prompt, system=system,
+            options=options or None, stream=True,
+        )
+        final = gen_mod.stream_to_stdout(chunks)
+
+
+@generate.command("chat")
+@click.option("--model", "-m", "model_name", required=True, help="Model name")
+@click.option("--message", "messages_input", multiple=True,
+              help="Messages as role:content (repeatable)")
+@click.option("--file", "messages_file", type=click.Path(exists=True), default=None,
+              help="JSON file with messages array")
+@click.option("--no-stream", is_flag=True, help="Return complete response instead of streaming")
+@click.option("--temperature", type=float, default=None, help="Sampling temperature")
+@click.option("--continue-chat", is_flag=True, help="Continue previous chat session")
+@handle_error
+def generate_chat(model_name, messages_input, messages_file, no_stream, temperature, continue_chat):
+    """Send a chat completion request."""
+    global _last_model, _chat_history
+    _last_model = model_name
+
+    options = {}
+    if temperature is not None:
+        options["temperature"] = temperature
+
+    # Build messages list
+    if messages_file:
+        with open(messages_file, "r") as f:
+            messages = json.load(f)
+    elif messages_input:
+        messages = []
+        for msg in messages_input:
+            if ":" not in msg:
+                raise ValueError(f"Invalid message format: '{msg}'. Use role:content")
+            role, content = msg.split(":", 1)
+            messages.append({"role": role.strip(), "content": content.strip()})
+    else:
+        raise ValueError("Provide messages via --message or --file")
+
+    if continue_chat:
+        messages = _chat_history + messages
+
+    if no_stream or _json_output:
+        result = gen_mod.chat(
+            _host, model_name, messages,
+            options=options or None, stream=False,
+        )
+        if not _json_output and "message" in result:
+            _chat_history = messages + [result["message"]]
+        output(result)
+    else:
+        chunks = gen_mod.chat(
+            _host, model_name, messages,
+            options=options or None, stream=True,
+        )
+        # Collect streamed content for history
+        collected = []
+        for chunk in chunks:
+            if "error" in chunk:
+                raise RuntimeError(chunk["error"])
+            if "message" in chunk and "content" in chunk["message"]:
+                token = chunk["message"]["content"]
+                collected.append(token)
+                sys.stdout.write(token)
+                sys.stdout.flush()
+        sys.stdout.write("\n")
+        sys.stdout.flush()
+        full_response = "".join(collected)
+        _chat_history = messages + [{"role": "assistant", "content": full_response}]
+
+
+# ── Embed Commands ───────────────────────────────────────────────
+@cli.group()
+def embed():
+    """Embedding generation commands."""
+    pass
+
+
+@embed.command("text")
+@click.option("--model", "-m", "model_name", required=True, help="Model name")
+@click.option("--input", "-i", "input_text", required=True, help="Text to embed")
+@handle_error
+def embed_text(model_name, input_text):
+    """Generate embeddings for text."""
+    result = embed_mod.embed(_host, model_name, input_text)
+    if _json_output:
+        output(result)
+    else:
+        embeddings = result.get("embeddings", [])
+        if embeddings:
+            dims = len(embeddings[0]) if embeddings else 0
+            click.echo(f"Model: {model_name}")
+            click.echo(f"Dimensions: {dims}")
+            click.echo(f"Vectors: {len(embeddings)}")
+            # Show first few values
+            if embeddings:
+                preview = embeddings[0][:5]
+                click.echo(f"Preview: [{', '.join(f'{v:.6f}' for v in preview)}, ...]")
+        else:
+            output(result)
+
+
+# ── Server Commands ──────────────────────────────────────────────
+@cli.group()
+def server():
+    """Server status and info commands."""
+    pass
+
+
+@server.command("status")
+@handle_error
+def server_status():
+    """Check if Ollama server is running."""
+    result = server_mod.server_status(_host)
+    output(result, f"Ollama server at {_host}: running")
+
+
+@server.command("version")
+@handle_error
+def server_version():
+    """Show Ollama server version."""
+    result = server_mod.version(_host)
+    output(result)
+
+
+# ── Session Commands ─────────────────────────────────────────────
+@cli.group()
+def session():
+    """Session state commands."""
+    pass
+
+
+@session.command("status")
+@handle_error
+def session_status():
+    """Show current session state."""
+    data = {
+        "host": _host,
+        "last_model": _last_model or "(none)",
+        "chat_history_length": len(_chat_history),
+        "json_output": _json_output,
+    }
+    output(data, "Session Status")
+
+
+@session.command("history")
+@handle_error
+def session_history():
+    """Show chat history for current session."""
+    if not _chat_history:
+        output({"messages": []}, "No chat history.")
+        return
+    if _json_output:
+        output({"messages": _chat_history})
+    else:
+        for msg in _chat_history:
+            role = msg.get("role", "unknown")
+            content = msg.get("content", "")
+            # Truncate long messages for display
+            if len(content) > 200:
+                content = content[:200] + "..."
+            click.echo(f"[{role}] {content}")
+
+
+# ── REPL ─────────────────────────────────────────────────────────
+@cli.command()
+@handle_error
+def repl():
+    """Start interactive REPL session."""
+    from cli_anything.ollama.utils.repl_skin import ReplSkin
+
+    global _repl_mode
+    _repl_mode = True
+
+    skin = ReplSkin("ollama", version="1.0.1")
+    skin.print_banner()
+
+    pt_session = skin.create_prompt_session()
+
+    _repl_commands = {
+        "model":    "list|show|pull|rm|copy|ps",
+        "generate": "text|chat",
+        "embed":    "text",
+        "server":   "status|version",
+        "session":  "status|history",
+        "help":     "Show this help",
+        "quit":     "Exit REPL",
+    }
+
+    while True:
+        try:
+            context = _last_model if _last_model else ""
+            line = skin.get_input(pt_session, project_name=context, modified=False)
+            if not line:
+                continue
+            if line.lower() in ("quit", "exit", "q"):
+                skin.print_goodbye()
+                break
+            if line.lower() == "help":
+                skin.help(_repl_commands)
+                continue
+
+            # Parse and execute command
+            args = line.split()
+            try:
+                cli.main(args, standalone_mode=False)
+            except SystemExit:
+                pass
+            except click.exceptions.UsageError as e:
+                skin.warning(f"Usage error: {e}")
+            except Exception as e:
+                skin.error(f"{e}")
+
+        except (EOFError, KeyboardInterrupt):
+            skin.print_goodbye()
+            break
+
+    _repl_mode = False
+
+
+# ── Helpers ──────────────────────────────────────────────────────
+def _format_size(size_bytes: int) -> str:
+    """Format byte count as human-readable string."""
+    if size_bytes == 0:
+        return "0 B"
+    for unit in ["B", "KB", "MB", "GB", "TB"]:
+        if abs(size_bytes) < 1024:
+            return f"{size_bytes:.1f} {unit}"
+        size_bytes /= 1024
+    return f"{size_bytes:.1f} PB"
+
+
+# ── Entry Point ──────────────────────────────────────────────────
+def main():
+    cli()
+
+
+if __name__ == "__main__":
+    main()
--- a/ollama/agent-harness/cli_anything/ollama/skills/SKILL.md
+++ b/ollama/agent-harness/cli_anything/ollama/skills/SKILL.md
@@ -0,0 +1,220 @@
+---
+name: >-
+  cli-anything-ollama
+description: >-
+  Command-line interface for Ollama - Local LLM inference and model management via Ollama REST API. Designed for AI agents and power users who need to manage models, generate text, chat, and create embeddings without a GUI.
+---
+
+# cli-anything-ollama
+
+Local LLM inference and model management via the Ollama REST API. Designed for AI agents and power users who need to manage models, generate text, chat, and create embeddings without a GUI.
+
+## Installation
+
+This CLI is installed as part of the cli-anything-ollama package:
+
+```bash
+pip install cli-anything-ollama
+```
+
+**Prerequisites:**
+- Python 3.10+
+- Ollama must be installed and running (`ollama serve`)
+
+
+## Usage
+
+### Basic Commands
+
+```bash
+# Show help
+cli-anything-ollama --help
+
+# Start interactive REPL mode
+cli-anything-ollama
+
+# List available models
+cli-anything-ollama model list
+
+# Run with JSON output (for agent consumption)
+cli-anything-ollama --json model list
+```
+
+### REPL Mode
+
+When invoked without a subcommand, the CLI enters an interactive REPL session:
+
+```bash
+cli-anything-ollama
+# Enter commands interactively with tab-completion and history
+```
+
+
+## Command Groups
+
+
+### Model
+
+Model management commands.
+
+| Command | Description |
+|---------|-------------|
+| `list` | List locally available models |
+| `show` | Show model details (parameters, template, license) |
+| `pull` | Download a model from the Ollama library |
+| `rm` | Delete a model from local storage |
+| `copy` | Copy a model to a new name |
+| `ps` | List models currently loaded in memory |
+
+
+### Generate
+
+Text generation and chat commands.
+
+| Command | Description |
+|---------|-------------|
+| `text` | Generate text from a prompt |
+| `chat` | Send a chat completion request |
+
+
+### Embed
+
+Embedding generation commands.
+
+| Command | Description |
+|---------|-------------|
+| `text` | Generate embeddings for text |
+
+
+### Server
+
+Server status and info commands.
+
+| Command | Description |
+|---------|-------------|
+| `status` | Check if Ollama server is running |
+| `version` | Show Ollama server version |
+
+
+### Session
+
+Session state commands.
+
+| Command | Description |
+|---------|-------------|
+| `status` | Show current session state |
+| `history` | Show chat history for current session |
+
+
+
+## Examples
+
+
+### List and Pull Models
+
+```bash
+# List available models
+cli-anything-ollama model list
+
+# Pull a model
+cli-anything-ollama model pull llama3.2
+
+# Show model details
+cli-anything-ollama model show llama3.2
+```
+
+
+### Generate Text
+
+```bash
+# Stream text (default)
+cli-anything-ollama generate text --model llama3.2 --prompt "Explain quantum computing in one sentence"
+
+# Non-streaming with JSON output (for agents)
+cli-anything-ollama --json generate text --model llama3.2 --prompt "Hello" --no-stream
+```
+
+
+### Chat
+
+```bash
+# Single-turn chat
+cli-anything-ollama generate chat --model llama3.2 --message "user:What is Python?"
+
+# Multi-turn chat
+cli-anything-ollama generate chat --model llama3.2 \
+  --message "user:What is Python?" \
+  --message "user:How does it compare to JavaScript?"
+
+# Chat from JSON file
+cli-anything-ollama generate chat --model llama3.2 --file messages.json
+```
+
+
+### Embeddings
+
+```bash
+cli-anything-ollama embed text --model nomic-embed-text --input "Hello world"
+```
+
+
+### Interactive REPL Session
+
+Start an interactive session for exploratory use.
+
+```bash
+cli-anything-ollama
+# Enter commands interactively
+# Use 'help' to see available commands
+```
+
+
+### Connect to Remote Host
+
+```bash
+cli-anything-ollama --host http://192.168.1.100:11434 model list
+```
+
+
+## State Management
+
+The CLI maintains lightweight session state:
+
+- **Current host URL**: Configurable via `--host`
+- **Chat history**: Tracked for multi-turn conversations in REPL
+- **Last used model**: Shown in REPL prompt
+
+## Output Formats
+
+All commands support dual output modes:
+
+- **Human-readable** (default): Tables, colors, formatted text
+- **Machine-readable** (`--json` flag): Structured JSON for agent consumption
+
+```bash
+# Human output
+cli-anything-ollama model list
+
+# JSON output for agents
+cli-anything-ollama --json model list
+```
+
+## For AI Agents
+
+When using this CLI programmatically:
+
+1. **Always use `--json` flag** for parseable output
+2. **Check return codes** - 0 for success, non-zero for errors
+3. **Parse stderr** for error messages on failure
+4. **Use `--no-stream`** for generate/chat to get complete responses
+5. **Verify Ollama is running** with `server status` before other commands
+
+## More Information
+
+- Full documentation: See README.md in the package
+- Test coverage: See TEST.md in the package
+- Methodology: See HARNESS.md in the cli-anything-plugin
+
+## Version
+
+1.0.1
--- a/ollama/agent-harness/cli_anything/ollama/tests/TEST.md
+++ b/ollama/agent-harness/cli_anything/ollama/tests/TEST.md
@@ -0,0 +1,43 @@
+# Ollama CLI — Test Plan & Results
+
+## Test Strategy
+
+### Unit Tests (`test_core.py`)
+These tests do NOT require Ollama to be running. They test:
+- URL construction in the backend module
+- Output formatting helpers
+- CLI argument parsing via Click's test runner
+- Session state management (host, last model, chat history)
+- Error handling for connection failures
+
+### E2E Tests (`test_full_e2e.py`)
+These tests REQUIRE Ollama running at `http://localhost:11434`. They test:
+- Listing models
+- Pulling a small test model
+- Generating text completions
+- Chat completions
+- Model info display
+- Embedding generation
+- Model deletion
+
+## Running Tests
+
+```bash
+cd ollama/agent-harness
+
+# Unit tests only (no Ollama needed)
+python -m pytest cli_anything/ollama/tests/test_core.py -v
+
+# E2E tests (requires Ollama running)
+python -m pytest cli_anything/ollama/tests/test_full_e2e.py -v
+
+# All tests
+python -m pytest cli_anything/ollama/tests/ -v
+```
+
+## Test Results
+
+| Test Suite | Status | Notes |
+|-----------|--------|-------|
+| test_core.py | Passed | 87/87 (run 2026-03-18) |
+| test_full_e2e.py | Passed | 10 passed, 1 skipped (embed model), run 2026-03-19 with `tinyllama` |
--- a/ollama/agent-harness/cli_anything/ollama/tests/init.py
+++ b/ollama/agent-harness/cli_anything/ollama/tests/init.py
--- a/ollama/agent-harness/cli_anything/ollama/tests/test_core.py
+++ b/ollama/agent-harness/cli_anything/ollama/tests/test_core.py
@@ -0,0 +1,888 @@
+"""Unit tests for cli-anything-ollama — no Ollama server required."""
+
+import json
+import pytest
+from unittest.mock import patch, MagicMock
+from click.testing import CliRunner
+
+from cli_anything.ollama.ollama_cli import cli, _format_size
+from cli_anything.ollama.utils.ollama_backend import DEFAULT_BASE_URL
+
+
+@pytest.fixture
+def runner():
+    return CliRunner()
+
+
+# ── Backend URL construction ─────────────────────────────────────
+
+class TestBackend:
+    def test_default_base_url(self):
+        assert DEFAULT_BASE_URL == "http://localhost:11434"
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_is_available_true(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import is_available
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_get.return_value = mock_resp
+        assert is_available() is True
+        mock_get.assert_called_once_with("http://localhost:11434/", timeout=5)
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_is_available_false(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import is_available
+        import requests
+        mock_get.side_effect = requests.exceptions.ConnectionError()
+        assert is_available() is False
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_api_get_connection_error(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import api_get
+        import requests
+        mock_get.side_effect = requests.exceptions.ConnectionError()
+        with pytest.raises(RuntimeError, match="Cannot connect to Ollama"):
+            api_get("http://localhost:11434", "/api/tags")
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.post")
+    def test_api_post_connection_error(self, mock_post):
+        from cli_anything.ollama.utils.ollama_backend import api_post
+        import requests
+        mock_post.side_effect = requests.exceptions.ConnectionError()
+        with pytest.raises(RuntimeError, match="Cannot connect to Ollama"):
+            api_post("http://localhost:11434", "/api/show", {"name": "test"})
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.delete")
+    def test_api_delete_connection_error(self, mock_delete):
+        from cli_anything.ollama.utils.ollama_backend import api_delete
+        import requests
+        mock_delete.side_effect = requests.exceptions.ConnectionError()
+        with pytest.raises(RuntimeError, match="Cannot connect to Ollama"):
+            api_delete("http://localhost:11434", "/api/delete", {"name": "test"})
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_api_get_success(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import api_get
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.content = b'{"models": []}'
+        mock_resp.headers = {"content-type": "application/json"}
+        mock_resp.json.return_value = {"models": []}
+        mock_resp.raise_for_status.return_value = None
+        mock_get.return_value = mock_resp
+        result = api_get("http://localhost:11434", "/api/tags")
+        assert result == {"models": []}
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_api_get_trailing_slash_stripped(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import api_get
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.content = b'{"models": []}'
+        mock_resp.headers = {"content-type": "application/json"}
+        mock_resp.json.return_value = {"models": []}
+        mock_resp.raise_for_status.return_value = None
+        mock_get.return_value = mock_resp
+        api_get("http://localhost:11434/", "/api/tags")
+        mock_get.assert_called_once_with(
+            "http://localhost:11434/api/tags", params=None, timeout=30
+        )
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_api_get_timeout(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import api_get
+        import requests
+        mock_get.side_effect = requests.exceptions.Timeout()
+        with pytest.raises(RuntimeError, match="timed out"):
+            api_get("http://localhost:11434", "/api/tags")
+
+
+# ── Output formatting ────────────────────────────────────────────
+
+class TestFormatSize:
+    def test_zero(self):
+        assert _format_size(0) == "0 B"
+
+    def test_bytes(self):
+        assert _format_size(512) == "512.0 B"
+
+    def test_kilobytes(self):
+        assert _format_size(2048) == "2.0 KB"
+
+    def test_megabytes(self):
+        result = _format_size(5 * 1024 * 1024)
+        assert "MB" in result
+
+    def test_gigabytes(self):
+        result = _format_size(3 * 1024 * 1024 * 1024)
+        assert "GB" in result
+
+
+# ── CLI argument parsing ─────────────────────────────────────────
+
+class TestCLIParsing:
+    def test_help(self, runner):
+        result = runner.invoke(cli, ["--help"])
+        assert result.exit_code == 0
+        assert "Ollama CLI" in result.output
+
+    def test_model_help(self, runner):
+        result = runner.invoke(cli, ["model", "--help"])
+        assert result.exit_code == 0
+        assert "list" in result.output
+        assert "show" in result.output
+        assert "pull" in result.output
+        assert "rm" in result.output
+        assert "copy" in result.output
+        assert "ps" in result.output
+
+    def test_generate_help(self, runner):
+        result = runner.invoke(cli, ["generate", "--help"])
+        assert result.exit_code == 0
+        assert "text" in result.output
+        assert "chat" in result.output
+
+    def test_embed_help(self, runner):
+        result = runner.invoke(cli, ["embed", "--help"])
+        assert result.exit_code == 0
+        assert "text" in result.output
+
+    def test_server_help(self, runner):
+        result = runner.invoke(cli, ["server", "--help"])
+        assert result.exit_code == 0
+        assert "status" in result.output
+        assert "version" in result.output
+
+    def test_session_help(self, runner):
+        result = runner.invoke(cli, ["session", "--help"])
+        assert result.exit_code == 0
+        assert "status" in result.output
+        assert "history" in result.output
+
+    def test_json_flag(self, runner):
+        result = runner.invoke(cli, ["--json", "session", "status"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert "host" in data
+
+    def test_host_flag(self, runner):
+        result = runner.invoke(cli, ["--host", "http://example:1234", "--json", "session", "status"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["host"] == "http://example:1234"
+
+
+# ── Session state ────────────────────────────────────────────────
+
+class TestSessionState:
+    def test_session_status_defaults(self, runner):
+        result = runner.invoke(cli, ["--json", "session", "status"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["chat_history_length"] == 0
+
+    def test_session_history_empty(self, runner):
+        result = runner.invoke(cli, ["--json", "session", "history"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["messages"] == []
+
+    def test_session_history_human(self, runner):
+        result = runner.invoke(cli, ["session", "history"])
+        assert result.exit_code == 0
+        assert "No chat history" in result.output
+
+
+# ── Error handling ───────────────────────────────────────────────
+
+class TestErrorHandling:
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_model_list_connection_error(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError(
+            "Cannot connect to Ollama at http://localhost:11434. "
+            "Is Ollama running? Start it with: ollama serve"
+        )
+        result = runner.invoke(cli, ["model", "list"])
+        assert result.exit_code == 1
+
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_model_list_connection_error_json(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError("Cannot connect to Ollama")
+        result = runner.invoke(cli, ["--json", "model", "list"])
+        assert result.exit_code == 1
+        data = json.loads(result.output)
+        assert "error" in data
+
+    @patch("cli_anything.ollama.core.server.api_get")
+    def test_server_status_error(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError("Cannot connect to Ollama")
+        result = runner.invoke(cli, ["server", "status"])
+        assert result.exit_code == 1
+
+    def test_generate_chat_no_messages(self, runner):
+        result = runner.invoke(cli, ["generate", "chat", "--model", "test"])
+        assert result.exit_code == 1
+
+    def test_generate_chat_bad_format(self, runner):
+        result = runner.invoke(cli, ["generate", "chat", "--model", "test",
+                                     "--message", "no-colon-here"])
+        assert result.exit_code == 1
+
+
+# ── Model commands with mocked API ──────────────────────────────
+
+class TestModelCommands:
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_model_list_empty(self, mock_api, runner):
+        mock_api.return_value = {"models": []}
+        result = runner.invoke(cli, ["model", "list"])
+        assert result.exit_code == 0
+        assert "No models" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_model_list_json(self, mock_api, runner):
+        mock_api.return_value = {"models": [{"name": "llama3.2", "size": 2000000000}]}
+        result = runner.invoke(cli, ["--json", "model", "list"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert len(data["models"]) == 1
+
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_model_list_formatted(self, mock_api, runner):
+        mock_api.return_value = {
+            "models": [{"name": "llama3.2:latest", "size": 2000000000, "modified_at": "2024-01-01T00:00:00Z"}]
+        }
+        result = runner.invoke(cli, ["model", "list"])
+        assert result.exit_code == 0
+        assert "llama3.2:latest" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_model_show(self, mock_api, runner):
+        mock_api.return_value = {"modelfile": "FROM llama3.2", "parameters": "temperature 0.7"}
+        result = runner.invoke(cli, ["--json", "model", "show", "llama3.2"])
+        assert result.exit_code == 0
+
+    @patch("cli_anything.ollama.core.models.api_delete")
+    def test_model_rm(self, mock_api, runner):
+        mock_api.return_value = {"status": "ok"}
+        result = runner.invoke(cli, ["model", "rm", "test-model"])
+        assert result.exit_code == 0
+        assert "Deleted" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_model_copy(self, mock_api, runner):
+        mock_api.return_value = {"status": "ok"}
+        result = runner.invoke(cli, ["model", "copy", "src", "dst"])
+        assert result.exit_code == 0
+        assert "Copied" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_model_ps_empty(self, mock_api, runner):
+        mock_api.return_value = {"models": []}
+        result = runner.invoke(cli, ["model", "ps"])
+        assert result.exit_code == 0
+        assert "No models" in result.output
+
+
+# ── Server commands with mocked API ──────────────────────────────
+
+class TestServerCommands:
+    @patch("cli_anything.ollama.core.server.api_get")
+    def test_server_status(self, mock_api, runner):
+        mock_api.return_value = {"status": "ok", "message": "Ollama is running"}
+        result = runner.invoke(cli, ["server", "status"])
+        assert result.exit_code == 0
+
+    @patch("cli_anything.ollama.core.server.api_get")
+    def test_server_version(self, mock_api, runner):
+        mock_api.return_value = {"version": "0.1.30"}
+        result = runner.invoke(cli, ["--json", "server", "version"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["version"] == "0.1.30"
+
+
+# ── Embed command with mocked API ────────────────────────────────
+
+class TestEmbedCommands:
+    @patch("cli_anything.ollama.core.embeddings.api_post")
+    def test_embed_text_json(self, mock_api, runner):
+        mock_api.return_value = {"embeddings": [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]]}
+        result = runner.invoke(cli, ["--json", "embed", "text",
+                                     "--model", "nomic-embed-text",
+                                     "--input", "Hello world"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert "embeddings" in data
+
+    @patch("cli_anything.ollama.core.embeddings.api_post")
+    def test_embed_text_human(self, mock_api, runner):
+        mock_api.return_value = {"embeddings": [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]]}
+        result = runner.invoke(cli, ["embed", "text",
+                                     "--model", "nomic-embed-text",
+                                     "--input", "Hello"])
+        assert result.exit_code == 0
+        assert "Dimensions: 6" in result.output
+
+    @patch("cli_anything.ollama.core.embeddings.api_post")
+    def test_embed_text_preview_values(self, mock_api, runner):
+        mock_api.return_value = {"embeddings": [[0.123456, 0.234567, 0.345678, 0.456789, 0.567890, 0.6]]}
+        result = runner.invoke(cli, ["embed", "text",
+                                     "--model", "nomic-embed-text",
+                                     "--input", "Hello"])
+        assert result.exit_code == 0
+        assert "Preview:" in result.output
+        assert "0.123456" in result.output
+
+    @patch("cli_anything.ollama.core.embeddings.api_post")
+    def test_embed_text_empty_embeddings(self, mock_api, runner):
+        mock_api.return_value = {"embeddings": []}
+        result = runner.invoke(cli, ["embed", "text",
+                                     "--model", "nomic-embed-text",
+                                     "--input", "Hello"])
+        assert result.exit_code == 0
+
+
+# ── Generate text with mocked API ────────────────────────────────
+
+class TestGenerateTextCommands:
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_generate_text_no_stream_json(self, mock_api, runner):
+        mock_api.return_value = {
+            "model": "llama3.2",
+            "response": "Hello! How can I help you?",
+            "done": True,
+            "total_duration": 1234567890,
+            "eval_count": 7,
+        }
+        result = runner.invoke(cli, ["--json", "generate", "text",
+                                     "--model", "llama3.2",
+                                     "--prompt", "Say hello",
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["response"] == "Hello! How can I help you?"
+        assert data["done"] is True
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_generate_text_no_stream_human(self, mock_api, runner):
+        mock_api.return_value = {
+            "model": "llama3.2",
+            "response": "The sky is blue.",
+            "done": True,
+        }
+        result = runner.invoke(cli, ["generate", "text",
+                                     "--model", "llama3.2",
+                                     "--prompt", "Why is the sky blue?",
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        assert "The sky is blue." in result.output
+
+    @patch("cli_anything.ollama.core.generate.api_post_stream")
+    def test_generate_text_streaming(self, mock_stream, runner):
+        mock_stream.return_value = iter([
+            {"response": "Hello", "done": False},
+            {"response": " world", "done": False},
+            {"response": "!", "done": True, "total_duration": 100000},
+        ])
+        result = runner.invoke(cli, ["generate", "text",
+                                     "--model", "llama3.2",
+                                     "--prompt", "Say hello"])
+        assert result.exit_code == 0
+        assert "Hello world!" in result.output
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_generate_text_with_system(self, mock_api, runner):
+        mock_api.return_value = {
+            "model": "llama3.2",
+            "response": "Ahoy!",
+            "done": True,
+        }
+        result = runner.invoke(cli, ["--json", "generate", "text",
+                                     "--model", "llama3.2",
+                                     "--prompt", "Say hello",
+                                     "--system", "You are a pirate",
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["response"] == "Ahoy!"
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_generate_text_with_options(self, mock_api, runner):
+        mock_api.return_value = {"model": "llama3.2", "response": "Hi", "done": True}
+        result = runner.invoke(cli, ["--json", "generate", "text",
+                                     "--model", "llama3.2",
+                                     "--prompt", "Hello",
+                                     "--temperature", "0.5",
+                                     "--top-p", "0.9",
+                                     "--num-predict", "50",
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        # Verify options were passed
+        call_args = mock_api.call_args
+        assert call_args is not None
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_generate_text_connection_error(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError("Cannot connect to Ollama")
+        result = runner.invoke(cli, ["generate", "text",
+                                     "--model", "llama3.2",
+                                     "--prompt", "Hello",
+                                     "--no-stream"])
+        assert result.exit_code == 1
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_generate_text_connection_error_json(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError("Cannot connect to Ollama")
+        result = runner.invoke(cli, ["--json", "generate", "text",
+                                     "--model", "llama3.2",
+                                     "--prompt", "Hello",
+                                     "--no-stream"])
+        assert result.exit_code == 1
+        data = json.loads(result.output)
+        assert "error" in data
+        assert "runtime_error" in data["type"]
+
+
+# ── Generate chat with mocked API ────────────────────────────────
+
+class TestGenerateChatCommands:
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_chat_no_stream_json(self, mock_api, runner):
+        mock_api.return_value = {
+            "model": "llama3.2",
+            "message": {"role": "assistant", "content": "Hello! How can I help?"},
+            "done": True,
+            "total_duration": 1234567890,
+        }
+        result = runner.invoke(cli, ["--json", "generate", "chat",
+                                     "--model", "llama3.2",
+                                     "--message", "user:Hi there",
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["message"]["role"] == "assistant"
+        assert "Hello" in data["message"]["content"]
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_chat_multi_message(self, mock_api, runner):
+        mock_api.return_value = {
+            "model": "llama3.2",
+            "message": {"role": "assistant", "content": "Python is great!"},
+            "done": True,
+        }
+        result = runner.invoke(cli, ["--json", "generate", "chat",
+                                     "--model", "llama3.2",
+                                     "--message", "user:What is Python?",
+                                     "--message", "assistant:It's a programming language",
+                                     "--message", "user:Tell me more",
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["message"]["content"] == "Python is great!"
+
+    @patch("cli_anything.ollama.core.generate.api_post_stream")
+    def test_chat_streaming(self, mock_stream, runner):
+        mock_stream.return_value = iter([
+            {"message": {"role": "assistant", "content": "I'm"}, "done": False},
+            {"message": {"role": "assistant", "content": " doing"}, "done": False},
+            {"message": {"role": "assistant", "content": " well!"}, "done": True},
+        ])
+        result = runner.invoke(cli, ["generate", "chat",
+                                     "--model", "llama3.2",
+                                     "--message", "user:How are you?"])
+        assert result.exit_code == 0
+        assert "I'm doing well!" in result.output
+
+    @patch("cli_anything.ollama.core.generate.api_post_stream")
+    def test_chat_streaming_error(self, mock_stream, runner):
+        mock_stream.return_value = iter([
+            {"message": {"role": "assistant", "content": "partial"}, "done": False},
+            {"error": "stream failed"},
+        ])
+        result = runner.invoke(cli, ["generate", "chat",
+                                     "--model", "llama3.2",
+                                     "--message", "user:Hello"])
+        assert result.exit_code == 1
+        assert "Error: stream failed" in result.output
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_chat_from_file(self, mock_api, runner, tmp_path):
+        messages_file = tmp_path / "messages.json"
+        messages_file.write_text(json.dumps([
+            {"role": "user", "content": "What is 2+2?"},
+        ]))
+        mock_api.return_value = {
+            "model": "llama3.2",
+            "message": {"role": "assistant", "content": "4"},
+            "done": True,
+        }
+        result = runner.invoke(cli, ["--json", "generate", "chat",
+                                     "--model", "llama3.2",
+                                     "--file", str(messages_file),
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["message"]["content"] == "4"
+
+    def test_chat_missing_model(self, runner):
+        result = runner.invoke(cli, ["generate", "chat",
+                                     "--message", "user:Hello"])
+        assert result.exit_code != 0
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_chat_connection_error(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError("Cannot connect to Ollama")
+        result = runner.invoke(cli, ["generate", "chat",
+                                     "--model", "llama3.2",
+                                     "--message", "user:Hello",
+                                     "--no-stream"])
+        assert result.exit_code == 1
+
+
+# ── Model pull with mocked API ───────────────────────────────────
+
+class TestModelPullCommands:
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_pull_no_stream(self, mock_api, runner):
+        mock_api.return_value = {"status": "success"}
+        result = runner.invoke(cli, ["model", "pull", "llama3.2", "--no-stream"])
+        assert result.exit_code == 0
+        assert "Pulled" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_pull_no_stream_json(self, mock_api, runner):
+        mock_api.return_value = {"status": "success"}
+        result = runner.invoke(cli, ["--json", "model", "pull", "llama3.2"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["status"] == "success"
+
+    @patch("cli_anything.ollama.core.models.api_post_stream")
+    def test_pull_streaming(self, mock_stream, runner):
+        mock_stream.return_value = iter([
+            {"status": "pulling manifest"},
+            {"status": "downloading", "digest": "sha256:abc123", "total": 1000, "completed": 500},
+            {"status": "downloading", "digest": "sha256:abc123", "total": 1000, "completed": 1000},
+            {"status": "verifying sha256 digest"},
+            {"status": "writing manifest"},
+            {"status": "success"},
+        ])
+        result = runner.invoke(cli, ["model", "pull", "llama3.2"])
+        assert result.exit_code == 0
+        assert "Done" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_post_stream")
+    def test_pull_streaming_error(self, mock_stream, runner):
+        mock_stream.return_value = iter([
+            {"status": "downloading"},
+            {"error": "disk full"},
+        ])
+        result = runner.invoke(cli, ["model", "pull", "llama3.2"])
+        assert result.exit_code == 1
+        assert "Error: disk full" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_pull_connection_error(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError("Cannot connect to Ollama")
+        result = runner.invoke(cli, ["model", "pull", "llama3.2", "--no-stream"])
+        assert result.exit_code == 1
+
+
+# ── Model ps with loaded models ──────────────────────────────────
+
+class TestModelPsCommands:
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_ps_with_models(self, mock_api, runner):
+        mock_api.return_value = {
+            "models": [{
+                "name": "llama3.2:latest",
+                "size": 3825819519,
+                "size_vram": 3825819519,
+                "expires_at": "2024-06-04T14:38:31.83753-07:00",
+            }]
+        }
+        result = runner.invoke(cli, ["model", "ps"])
+        assert result.exit_code == 0
+        assert "llama3.2:latest" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_get")
+    def test_ps_with_models_json(self, mock_api, runner):
+        mock_api.return_value = {
+            "models": [{
+                "name": "llama3.2:latest",
+                "size": 3825819519,
+                "size_vram": 3825819519,
+            }]
+        }
+        result = runner.invoke(cli, ["--json", "model", "ps"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert len(data["models"]) == 1
+        assert data["models"][0]["name"] == "llama3.2:latest"
+
+
+# ── Model show with full response ────────────────────────────────
+
+class TestModelShowCommands:
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_show_human_output(self, mock_api, runner):
+        mock_api.return_value = {
+            "modelfile": "FROM llama3.2\nPARAMETER temperature 0.7",
+            "parameters": "temperature 0.7\ntop_p 0.9",
+            "template": "{{ .Prompt }}",
+            "details": {
+                "parent_model": "",
+                "format": "gguf",
+                "family": "llama",
+                "parameter_size": "3.2B",
+                "quantization_level": "Q4_0",
+            },
+        }
+        result = runner.invoke(cli, ["model", "show", "llama3.2"])
+        assert result.exit_code == 0
+        assert "llama3.2" in result.output
+
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_show_json_output(self, mock_api, runner):
+        mock_api.return_value = {
+            "modelfile": "FROM llama3.2",
+            "details": {"family": "llama", "parameter_size": "3.2B"},
+        }
+        result = runner.invoke(cli, ["--json", "model", "show", "llama3.2"])
+        assert result.exit_code == 0
+        data = json.loads(result.output)
+        assert data["details"]["family"] == "llama"
+
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_show_nonexistent_model(self, mock_api, runner):
+        mock_api.side_effect = RuntimeError("Ollama API error 404 on POST /api/show: model not found")
+        result = runner.invoke(cli, ["model", "show", "nonexistent"])
+        assert result.exit_code == 1
+
+
+# ── Backend streaming ────────────────────────────────────────────
+
+class TestBackendStreaming:
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.post")
+    def test_api_post_stream_success(self, mock_post):
+        from cli_anything.ollama.utils.ollama_backend import api_post_stream
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.raise_for_status.return_value = None
+        mock_resp.iter_lines.return_value = [
+            b'{"response": "Hello", "done": false}',
+            b'{"response": " world", "done": true}',
+        ]
+        mock_post.return_value = mock_resp
+        chunks = list(api_post_stream("http://localhost:11434", "/api/generate", {"model": "test"}))
+        assert len(chunks) == 2
+        assert chunks[0]["response"] == "Hello"
+        assert chunks[1]["done"] is True
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.post")
+    def test_api_post_stream_connection_error(self, mock_post):
+        from cli_anything.ollama.utils.ollama_backend import api_post_stream
+        import requests
+        mock_post.side_effect = requests.exceptions.ConnectionError()
+        with pytest.raises(RuntimeError, match="Cannot connect to Ollama"):
+            list(api_post_stream("http://localhost:11434", "/api/generate", {}))
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.post")
+    def test_api_post_stream_skips_empty_lines(self, mock_post):
+        from cli_anything.ollama.utils.ollama_backend import api_post_stream
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.raise_for_status.return_value = None
+        mock_resp.iter_lines.return_value = [
+            b'{"response": "Hi", "done": false}',
+            b'',
+            b'{"response": "!", "done": true}',
+        ]
+        mock_post.return_value = mock_resp
+        chunks = list(api_post_stream("http://localhost:11434", "/api/generate", {}))
+        assert len(chunks) == 2
+
+
+# ── Backend HTTP error handling ──────────────────────────────────
+
+class TestBackendHTTPErrors:
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.post")
+    def test_api_post_http_error(self, mock_post):
+        from cli_anything.ollama.utils.ollama_backend import api_post
+        import requests
+        mock_resp = MagicMock()
+        mock_resp.status_code = 404
+        mock_resp.text = "model not found"
+        mock_resp.raise_for_status.side_effect = requests.exceptions.HTTPError()
+        mock_post.return_value = mock_resp
+        with pytest.raises(RuntimeError, match="Ollama API error 404"):
+            api_post("http://localhost:11434", "/api/show", {"name": "bad"})
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.post")
+    def test_api_post_timeout(self, mock_post):
+        from cli_anything.ollama.utils.ollama_backend import api_post
+        import requests
+        mock_post.side_effect = requests.exceptions.Timeout()
+        with pytest.raises(RuntimeError, match="timed out"):
+            api_post("http://localhost:11434", "/api/show", {"name": "test"})
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.delete")
+    def test_api_delete_http_error(self, mock_delete):
+        from cli_anything.ollama.utils.ollama_backend import api_delete
+        import requests
+        mock_resp = MagicMock()
+        mock_resp.status_code = 404
+        mock_resp.text = "model not found"
+        mock_resp.raise_for_status.side_effect = requests.exceptions.HTTPError()
+        mock_delete.return_value = mock_resp
+        with pytest.raises(RuntimeError, match="Ollama API error 404"):
+            api_delete("http://localhost:11434", "/api/delete", {"name": "bad"})
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.delete")
+    def test_api_delete_timeout(self, mock_delete):
+        from cli_anything.ollama.utils.ollama_backend import api_delete
+        import requests
+        mock_delete.side_effect = requests.exceptions.Timeout()
+        with pytest.raises(RuntimeError, match="timed out"):
+            api_delete("http://localhost:11434", "/api/delete", {"name": "test"})
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_api_get_http_error(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import api_get
+        import requests
+        mock_resp = MagicMock()
+        mock_resp.status_code = 500
+        mock_resp.text = "internal server error"
+        mock_resp.raise_for_status.side_effect = requests.exceptions.HTTPError()
+        mock_get.return_value = mock_resp
+        with pytest.raises(RuntimeError, match="Ollama API error 500"):
+            api_get("http://localhost:11434", "/api/tags")
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_api_get_plain_text_response(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import api_get
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.content = b"Ollama is running"
+        mock_resp.headers = {"content-type": "text/plain"}
+        mock_resp.text = "Ollama is running"
+        mock_resp.raise_for_status.return_value = None
+        mock_get.return_value = mock_resp
+        result = api_get("http://localhost:11434", "/")
+        assert result["message"] == "Ollama is running"
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.post")
+    def test_api_post_204_no_content(self, mock_post):
+        from cli_anything.ollama.utils.ollama_backend import api_post
+        mock_resp = MagicMock()
+        mock_resp.status_code = 204
+        mock_resp.content = b""
+        mock_resp.raise_for_status.return_value = None
+        mock_post.return_value = mock_resp
+        result = api_post("http://localhost:11434", "/api/copy", {"source": "a", "destination": "b"})
+        assert result == {"status": "ok"}
+
+    @patch("cli_anything.ollama.utils.ollama_backend.requests.get")
+    def test_is_available_timeout(self, mock_get):
+        from cli_anything.ollama.utils.ollama_backend import is_available
+        import requests
+        mock_get.side_effect = requests.exceptions.Timeout()
+        assert is_available() is False
+
+
+# ── Core module direct tests ─────────────────────────────────────
+
+class TestCoreModules:
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_generate_builds_correct_payload(self, mock_api):
+        from cli_anything.ollama.core.generate import generate
+        mock_api.return_value = {"response": "hi", "done": True}
+        generate("http://localhost:11434", "llama3.2", "Hello",
+                 system="Be helpful", options={"temperature": 0.5}, stream=False)
+        call_data = mock_api.call_args[0][2]
+        assert call_data["model"] == "llama3.2"
+        assert call_data["prompt"] == "Hello"
+        assert call_data["system"] == "Be helpful"
+        assert call_data["options"]["temperature"] == 0.5
+        assert call_data["stream"] is False
+
+    @patch("cli_anything.ollama.core.generate.api_post")
+    def test_chat_builds_correct_payload(self, mock_api):
+        from cli_anything.ollama.core.generate import chat
+        mock_api.return_value = {"message": {"role": "assistant", "content": "hi"}, "done": True}
+        messages = [{"role": "user", "content": "Hello"}]
+        chat("http://localhost:11434", "llama3.2", messages,
+             options={"temperature": 0.8}, stream=False)
+        call_data = mock_api.call_args[0][2]
+        assert call_data["model"] == "llama3.2"
+        assert call_data["messages"] == messages
+        assert call_data["options"]["temperature"] == 0.8
+
+    @patch("cli_anything.ollama.core.embeddings.api_post")
+    def test_embed_builds_correct_payload(self, mock_api):
+        from cli_anything.ollama.core.embeddings import embed
+        mock_api.return_value = {"embeddings": [[0.1, 0.2]]}
+        embed("http://localhost:11434", "nomic-embed-text", "test input")
+        call_data = mock_api.call_args[0][2]
+        assert call_data["model"] == "nomic-embed-text"
+        assert call_data["input"] == "test input"
+
+    @patch("cli_anything.ollama.core.embeddings.api_post")
+    def test_embed_list_input(self, mock_api):
+        from cli_anything.ollama.core.embeddings import embed
+        mock_api.return_value = {"embeddings": [[0.1], [0.2]]}
+        embed("http://localhost:11434", "nomic-embed-text", ["hello", "world"])
+        call_data = mock_api.call_args[0][2]
+        assert call_data["input"] == ["hello", "world"]
+
+    @patch("cli_anything.ollama.core.models.api_post")
+    def test_copy_model_payload(self, mock_api):
+        from cli_anything.ollama.core.models import copy_model
+        mock_api.return_value = {"status": "ok"}
+        copy_model("http://localhost:11434", "llama3.2", "my-llama")
+        call_data = mock_api.call_args[0][2]
+        assert call_data["source"] == "llama3.2"
+        assert call_data["destination"] == "my-llama"
+
+    @patch("cli_anything.ollama.core.models.api_delete")
+    def test_delete_model_payload(self, mock_api):
+        from cli_anything.ollama.core.models import delete_model
+        mock_api.return_value = {"status": "ok"}
+        delete_model("http://localhost:11434", "old-model")
+        call_data = mock_api.call_args[0][2]
+        assert call_data["name"] == "old-model"
+
+
+# ── Stream to stdout helper ──────────────────────────────────────
+
+class TestStreamToStdout:
+    def test_stream_to_stdout_generate(self, capsys):
+        from cli_anything.ollama.core.generate import stream_to_stdout
+        chunks = iter([
+            {"response": "Hello", "done": False},
+            {"response": " there", "done": False},
+            {"response": "!", "done": True, "total_duration": 999},
+        ])
+        final = stream_to_stdout(chunks)
+        captured = capsys.readouterr()
+        assert "Hello there!" in captured.out
+        assert final["done"] is True
+        assert final["total_duration"] == 999
+
+    def test_stream_to_stdout_chat(self, capsys):
+        from cli_anything.ollama.core.generate import stream_to_stdout
+        chunks = iter([
+            {"message": {"role": "assistant", "content": "Yes"}, "done": False},
+            {"message": {"role": "assistant", "content": "!"}, "done": True},
+        ])
+        final = stream_to_stdout(chunks)
+        captured = capsys.readouterr()
+        assert "Yes!" in captured.out
+
+    def test_stream_to_stdout_empty(self, capsys):
+        from cli_anything.ollama.core.generate import stream_to_stdout
+        chunks = iter([{"done": True}])
+        final = stream_to_stdout(chunks)
+        captured = capsys.readouterr()
+        assert final["done"] is True
--- a/ollama/agent-harness/cli_anything/ollama/tests/test_full_e2e.py
+++ b/ollama/agent-harness/cli_anything/ollama/tests/test_full_e2e.py
@@ -0,0 +1,120 @@
+"""E2E tests for cli-anything-ollama — requires Ollama running at localhost:11434.
+
+These tests interact with a real Ollama server. Skip if Ollama is not available.
+
+Usage:
+    python -m pytest cli_anything/ollama/tests/test_full_e2e.py -v
+"""
+
+import pytest
+from click.testing import CliRunner
+
+from cli_anything.ollama.utils.ollama_backend import is_available, DEFAULT_BASE_URL
+from cli_anything.ollama.ollama_cli import cli
+
+# Skip all tests if Ollama is not running
+pytestmark = pytest.mark.skipif(
+    not is_available(DEFAULT_BASE_URL),
+    reason="Ollama server not available at localhost:11434"
+)
+
+# Small model for testing — tinyllama is ~637MB
+TEST_MODEL = "tinyllama"
+
+
+@pytest.fixture
+def runner():
+    return CliRunner()
+
+
+class TestServerE2E:
+    def test_server_status(self, runner):
+        result = runner.invoke(cli, ["server", "status"])
+        assert result.exit_code == 0
+
+    def test_server_version(self, runner):
+        result = runner.invoke(cli, ["--json", "server", "version"])
+        assert result.exit_code == 0
+        import json
+        data = json.loads(result.output)
+        assert "version" in data
+
+
+class TestModelE2E:
+    def test_model_list(self, runner):
+        result = runner.invoke(cli, ["--json", "model", "list"])
+        assert result.exit_code == 0
+        import json
+        data = json.loads(result.output)
+        assert "models" in data
+
+    def test_model_pull(self, runner):
+        result = runner.invoke(cli, ["model", "pull", TEST_MODEL, "--no-stream"])
+        assert result.exit_code == 0
+
+    def test_model_show(self, runner):
+        # Ensure model is pulled first
+        runner.invoke(cli, ["model", "pull", TEST_MODEL, "--no-stream"])
+        result = runner.invoke(cli, ["--json", "model", "show", TEST_MODEL])
+        assert result.exit_code == 0
+
+    def test_model_ps(self, runner):
+        result = runner.invoke(cli, ["--json", "model", "ps"])
+        assert result.exit_code == 0
+
+    def test_model_copy_and_delete(self, runner):
+        # Ensure source exists
+        runner.invoke(cli, ["model", "pull", TEST_MODEL, "--no-stream"])
+        # Copy
+        result = runner.invoke(cli, ["model", "copy", TEST_MODEL, f"{TEST_MODEL}-test-copy"])
+        assert result.exit_code == 0
+        # Delete copy
+        result = runner.invoke(cli, ["model", "rm", f"{TEST_MODEL}-test-copy"])
+        assert result.exit_code == 0
+
+
+class TestGenerateE2E:
+    def test_generate_text(self, runner):
+        # Ensure model exists
+        runner.invoke(cli, ["model", "pull", TEST_MODEL, "--no-stream"])
+        result = runner.invoke(cli, ["--json", "generate", "text",
+                                     "--model", TEST_MODEL,
+                                     "--prompt", "Say hello in one word",
+                                     "--no-stream", "--num-predict", "10"])
+        assert result.exit_code == 0
+        import json
+        data = json.loads(result.output)
+        assert "response" in data
+
+    def test_generate_chat(self, runner):
+        # Ensure model exists
+        runner.invoke(cli, ["model", "pull", TEST_MODEL, "--no-stream"])
+        result = runner.invoke(cli, ["--json", "generate", "chat",
+                                     "--model", TEST_MODEL,
+                                     "--message", "user:Say hi",
+                                     "--no-stream"])
+        assert result.exit_code == 0
+        import json
+        data = json.loads(result.output)
+        assert "message" in data
+
+
+class TestEmbedE2E:
+    """Embedding tests — requires a model that supports embeddings."""
+
+    @pytest.mark.skipif(True, reason="Requires an embedding model like nomic-embed-text")
+    def test_embed_text(self, runner):
+        result = runner.invoke(cli, ["--json", "embed", "text",
+                                     "--model", "nomic-embed-text",
+                                     "--input", "Hello world"])
+        assert result.exit_code == 0
+        import json
+        data = json.loads(result.output)
+        assert "embeddings" in data
+
+
+class TestCleanup:
+    def test_delete_test_model(self, runner):
+        """Clean up test model after all tests."""
+        result = runner.invoke(cli, ["model", "rm", TEST_MODEL])
+        # Don't assert exit_code — model might not exist
--- a/ollama/agent-harness/cli_anything/ollama/utils/init.py
+++ b/ollama/agent-harness/cli_anything/ollama/utils/init.py
--- a/ollama/agent-harness/cli_anything/ollama/utils/ollama_backend.py
+++ b/ollama/agent-harness/cli_anything/ollama/utils/ollama_backend.py
@@ -0,0 +1,187 @@
+"""Ollama REST API wrapper — the single module that makes network requests.
+
+Ollama runs a local HTTP server (default: http://localhost:11434).
+No authentication is required by default.
+"""
+
+import requests
+from typing import Any, Generator
+
+# Default Ollama server URL
+DEFAULT_BASE_URL = "http://localhost:11434"
+
+
+def api_get(base_url: str, endpoint: str, params: dict | None = None,
+            timeout: int = 30) -> Any:
+    """Perform a GET request against the Ollama API.
+
+    Args:
+        base_url: Ollama server base URL (e.g., 'http://localhost:11434').
+        endpoint: API endpoint path (e.g., '/api/tags').
+        params: Optional query parameters.
+        timeout: Request timeout in seconds.
+
+    Returns:
+        Parsed JSON response as a dict or list.
+
+    Raises:
+        RuntimeError: On HTTP error or connection failure.
+    """
+    url = f"{base_url.rstrip('/')}{endpoint}"
+    try:
+        resp = requests.get(url, params=params, timeout=timeout)
+        resp.raise_for_status()
+        if resp.status_code == 204 or not resp.content:
+            return {"status": "ok"}
+        # Some endpoints (like /) return plain text
+        content_type = resp.headers.get("content-type", "")
+        if "application/json" in content_type:
+            return resp.json()
+        return {"status": "ok", "message": resp.text.strip()}
+    except requests.exceptions.ConnectionError as e:
+        raise RuntimeError(
+            f"Cannot connect to Ollama at {base_url}. "
+            "Is Ollama running? Start it with: ollama serve"
+        ) from e
+    except requests.exceptions.HTTPError as e:
+        raise RuntimeError(
+            f"Ollama API error {resp.status_code} on GET {endpoint}: {resp.text}"
+        ) from e
+    except requests.exceptions.Timeout as e:
+        raise RuntimeError(
+            f"Request to Ollama timed out: GET {endpoint}"
+        ) from e
+
+
+def api_post(base_url: str, endpoint: str, data: dict | None = None,
+             timeout: int = 30) -> Any:
+    """Perform a POST request against the Ollama API.
+
+    Args:
+        base_url: Ollama server base URL.
+        endpoint: API endpoint path.
+        data: JSON request body.
+        timeout: Request timeout in seconds.
+
+    Returns:
+        Parsed JSON response.
+
+    Raises:
+        RuntimeError: On HTTP error or connection failure.
+    """
+    url = f"{base_url.rstrip('/')}{endpoint}"
+    try:
+        resp = requests.post(url, json=data, timeout=timeout)
+        resp.raise_for_status()
+        if resp.status_code == 204 or not resp.content:
+            return {"status": "ok"}
+        return resp.json()
+    except requests.exceptions.ConnectionError as e:
+        raise RuntimeError(
+            f"Cannot connect to Ollama at {base_url}. "
+            "Is Ollama running? Start it with: ollama serve"
+        ) from e
+    except requests.exceptions.HTTPError as e:
+        raise RuntimeError(
+            f"Ollama API error {resp.status_code} on POST {endpoint}: {resp.text}"
+        ) from e
+    except requests.exceptions.Timeout as e:
+        raise RuntimeError(
+            f"Request to Ollama timed out: POST {endpoint}"
+        ) from e
+
+
+def api_delete(base_url: str, endpoint: str, data: dict | None = None,
+               timeout: int = 30) -> Any:
+    """Perform a DELETE request against the Ollama API.
+
+    Args:
+        base_url: Ollama server base URL.
+        endpoint: API endpoint path.
+        data: Optional JSON request body.
+        timeout: Request timeout in seconds.
+
+    Returns:
+        Parsed JSON response or status dict.
+
+    Raises:
+        RuntimeError: On HTTP error or connection failure.
+    """
+    url = f"{base_url.rstrip('/')}{endpoint}"
+    try:
+        resp = requests.delete(url, json=data, timeout=timeout)
+        resp.raise_for_status()
+        if resp.status_code == 204 or not resp.content:
+            return {"status": "ok"}
+        return resp.json()
+    except requests.exceptions.ConnectionError as e:
+        raise RuntimeError(
+            f"Cannot connect to Ollama at {base_url}. "
+            "Is Ollama running? Start it with: ollama serve"
+        ) from e
+    except requests.exceptions.HTTPError as e:
+        raise RuntimeError(
+            f"Ollama API error {resp.status_code} on DELETE {endpoint}: {resp.text}"
+        ) from e
+    except requests.exceptions.Timeout as e:
+        raise RuntimeError(
+            f"Request to Ollama timed out: DELETE {endpoint}"
+        ) from e
+
+
+def api_post_stream(base_url: str, endpoint: str, data: dict | None = None,
+                    timeout: int = 300) -> Generator[dict, None, None]:
+    """Perform a POST request with streaming NDJSON response.
+
+    Used for generate, chat, and pull endpoints that stream progress.
+
+    Args:
+        base_url: Ollama server base URL.
+        endpoint: API endpoint path.
+        data: JSON request body.
+        timeout: Request timeout in seconds (longer default for generation).
+
+    Yields:
+        Parsed JSON objects from the NDJSON stream.
+
+    Raises:
+        RuntimeError: On HTTP error or connection failure.
+    """
+    import json as json_mod
+
+    url = f"{base_url.rstrip('/')}{endpoint}"
+    try:
+        resp = requests.post(url, json=data, stream=True, timeout=timeout)
+        resp.raise_for_status()
+        for line in resp.iter_lines():
+            if line:
+                yield json_mod.loads(line)
+    except requests.exceptions.ConnectionError as e:
+        raise RuntimeError(
+            f"Cannot connect to Ollama at {base_url}. "
+            "Is Ollama running? Start it with: ollama serve"
+        ) from e
+    except requests.exceptions.HTTPError as e:
+        raise RuntimeError(
+            f"Ollama API error {resp.status_code} on POST {endpoint}: {resp.text}"
+        ) from e
+    except requests.exceptions.Timeout as e:
+        raise RuntimeError(
+            f"Request to Ollama timed out: POST {endpoint}"
+        ) from e
+
+
+def is_available(base_url: str = DEFAULT_BASE_URL) -> bool:
+    """Check if Ollama server is reachable.
+
+    Args:
+        base_url: Ollama server base URL.
+
+    Returns:
+        True if the server responds, False otherwise.
+    """
+    try:
+        resp = requests.get(f"{base_url.rstrip('/')}/", timeout=5)
+        return resp.status_code == 200
+    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
+        return False
--- a/ollama/agent-harness/cli_anything/ollama/utils/repl_skin.py
+++ b/ollama/agent-harness/cli_anything/ollama/utils/repl_skin.py
@@ -0,0 +1,500 @@
+"""cli-anything REPL Skin — Unified terminal interface for all CLI harnesses.
+
+Copy this file into your CLI package at:
+    cli_anything/<software>/utils/repl_skin.py
+
+Usage:
+    from cli_anything.<software>.utils.repl_skin import ReplSkin
+
+    skin = ReplSkin("ollama", version="1.0.0")
+    skin.print_banner()
+    prompt_text = skin.prompt(project_name="llama3.2", modified=False)
+    skin.success("Model pulled")
+    skin.error("Connection failed")
+    skin.warning("No models loaded")
+    skin.info("Generating...")
+    skin.status("Model", "llama3.2:latest")
+    skin.table(headers, rows)
+    skin.print_goodbye()
+"""
+
+import os
+import sys
+
+# ── ANSI color codes (no external deps for core styling) ──────────────
+
+_RESET = "\033[0m"
+_BOLD = "\033[1m"
+_DIM = "\033[2m"
+_ITALIC = "\033[3m"
+_UNDERLINE = "\033[4m"
+
+# Brand colors
+_CYAN = "\033[38;5;80m"       # cli-anything brand cyan
+_CYAN_BG = "\033[48;5;80m"
+_WHITE = "\033[97m"
+_GRAY = "\033[38;5;245m"
+_DARK_GRAY = "\033[38;5;240m"
+_LIGHT_GRAY = "\033[38;5;250m"
+
+# Software accent colors — each software gets a unique accent
+_ACCENT_COLORS = {
+    "gimp":        "\033[38;5;214m",   # warm orange
+    "blender":     "\033[38;5;208m",   # deep orange
+    "inkscape":    "\033[38;5;39m",    # bright blue
+    "audacity":    "\033[38;5;33m",    # navy blue
+    "libreoffice": "\033[38;5;40m",    # green
+    "obs_studio":  "\033[38;5;55m",    # purple
+    "kdenlive":    "\033[38;5;69m",    # slate blue
+    "shotcut":     "\033[38;5;35m",    # teal green
+    "ollama":      "\033[38;5;255m",   # white (Ollama branding)
+}
+_DEFAULT_ACCENT = "\033[38;5;75m"      # default sky blue
+
+# Status colors
+_GREEN = "\033[38;5;78m"
+_YELLOW = "\033[38;5;220m"
+_RED = "\033[38;5;196m"
+_BLUE = "\033[38;5;75m"
+_MAGENTA = "\033[38;5;176m"
+
+# ── Brand icon ────────────────────────────────────────────────────────
+
+# The cli-anything icon: a small colored diamond/chevron mark
+_ICON = f"{_CYAN}{_BOLD}◆{_RESET}"
+_ICON_SMALL = f"{_CYAN}▸{_RESET}"
+
+# ── Box drawing characters ────────────────────────────────────────────
+
+_H_LINE = "─"
+_V_LINE = "│"
+_TL = "╭"
+_TR = "╮"
+_BL = "╰"
+_BR = "╯"
+_T_DOWN = "┬"
+_T_UP = "┴"
+_T_RIGHT = "├"
+_T_LEFT = "┤"
+_CROSS = "┼"
+
+
+def _strip_ansi(text: str) -> str:
+    """Remove ANSI escape codes for length calculation."""
+    import re
+    return re.sub(r"\033\[[^m]*m", "", text)
+
+
+def _visible_len(text: str) -> int:
+    """Get visible length of text (excluding ANSI codes)."""
+    return len(_strip_ansi(text))
+
+
+class ReplSkin:
+    """Unified REPL skin for cli-anything CLIs.
+
+    Provides consistent branding, prompts, and message formatting
+    across all CLI harnesses built with the cli-anything methodology.
+    """
+
+    def __init__(self, software: str, version: str = "1.0.0",
+                 history_file: str | None = None):
+        """Initialize the REPL skin.
+
+        Args:
+            software: Software name (e.g., "gimp", "shotcut", "ollama").
+            version: CLI version string.
+            history_file: Path for persistent command history.
+                         Defaults to ~/.cli-anything-<software>/history
+        """
+        self.software = software.lower().replace("-", "_")
+        self.display_name = software.replace("_", " ").title()
+        self.version = version
+        self.accent = _ACCENT_COLORS.get(self.software, _DEFAULT_ACCENT)
+
+        # History file
+        if history_file is None:
+            from pathlib import Path
+            hist_dir = Path.home() / f".cli-anything-{self.software}"
+            hist_dir.mkdir(parents=True, exist_ok=True)
+            self.history_file = str(hist_dir / "history")
+        else:
+            self.history_file = history_file
+
+        # Detect terminal capabilities
+        self._color = self._detect_color_support()
+
+    def _detect_color_support(self) -> bool:
+        """Check if terminal supports color."""
+        if os.environ.get("NO_COLOR"):
+            return False
+        if os.environ.get("CLI_ANYTHING_NO_COLOR"):
+            return False
+        if not hasattr(sys.stdout, "isatty"):
+            return False
+        return sys.stdout.isatty()
+
+    def _c(self, code: str, text: str) -> str:
+        """Apply color code if colors are supported."""
+        if not self._color:
+            return text
+        return f"{code}{text}{_RESET}"
+
+    # ── Banner ────────────────────────────────────────────────────────
+
+    def print_banner(self):
+        """Print the startup banner with branding."""
+        inner = 54
+
+        def _box_line(content: str) -> str:
+            """Wrap content in box drawing, padding to inner width."""
+            pad = inner - _visible_len(content)
+            vl = self._c(_DARK_GRAY, _V_LINE)
+            return f"{vl}{content}{' ' * max(0, pad)}{vl}"
+
+        top = self._c(_DARK_GRAY, f"{_TL}{_H_LINE * inner}{_TR}")
+        bot = self._c(_DARK_GRAY, f"{_BL}{_H_LINE * inner}{_BR}")
+
+        # Title:  ◆  cli-anything · Ollama
+        icon = self._c(_CYAN + _BOLD, "◆")
+        brand = self._c(_CYAN + _BOLD, "cli-anything")
+        dot = self._c(_DARK_GRAY, "·")
+        name = self._c(self.accent + _BOLD, self.display_name)
+        title = f" {icon}  {brand} {dot} {name}"
+
+        ver = f" {self._c(_DARK_GRAY, f'   v{self.version}')}"
+        tip = f" {self._c(_DARK_GRAY, '   Type help for commands, quit to exit')}"
+        empty = ""
+
+        print(top)
+        print(_box_line(title))
+        print(_box_line(ver))
+        print(_box_line(empty))
+        print(_box_line(tip))
+        print(bot)
+        print()
+
+    # ── Prompt ────────────────────────────────────────────────────────
+
+    def prompt(self, project_name: str = "", modified: bool = False,
+               context: str = "") -> str:
+        """Build a styled prompt string for prompt_toolkit or input().
+
+        Args:
+            project_name: Current project name (empty if none open).
+            modified: Whether the project has unsaved changes.
+            context: Optional extra context to show in prompt.
+
+        Returns:
+            Formatted prompt string.
+        """
+        parts = []
+
+        # Icon
+        if self._color:
+            parts.append(f"{_CYAN}◆{_RESET} ")
+        else:
+            parts.append("> ")
+
+        # Software name
+        parts.append(self._c(self.accent + _BOLD, self.software))
+
+        # Project context
+        if project_name or context:
+            ctx = context or project_name
+            mod = "*" if modified else ""
+            parts.append(f" {self._c(_DARK_GRAY, '[')}")
+            parts.append(self._c(_LIGHT_GRAY, f"{ctx}{mod}"))
+            parts.append(self._c(_DARK_GRAY, ']'))
+
+        parts.append(self._c(_GRAY, " ❯ "))
+
+        return "".join(parts)
+
+    def prompt_tokens(self, project_name: str = "", modified: bool = False,
+                      context: str = ""):
+        """Build prompt_toolkit formatted text tokens for the prompt.
+
+        Use with prompt_toolkit's FormattedText for proper ANSI handling.
+
+        Returns:
+            list of (style, text) tuples for prompt_toolkit.
+        """
+        accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
+        tokens = []
+
+        tokens.append(("class:icon", "◆ "))
+        tokens.append(("class:software", self.software))
+
+        if project_name or context:
+            ctx = context or project_name
+            mod = "*" if modified else ""
+            tokens.append(("class:bracket", " ["))
+            tokens.append(("class:context", f"{ctx}{mod}"))
+            tokens.append(("class:bracket", "]"))
+
+        tokens.append(("class:arrow", " ❯ "))
+
+        return tokens
+
+    def get_prompt_style(self):
+        """Get a prompt_toolkit Style object matching the skin.
+
+        Returns:
+            prompt_toolkit.styles.Style
+        """
+        try:
+            from prompt_toolkit.styles import Style
+        except ImportError:
+            return None
+
+        accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
+
+        return Style.from_dict({
+            "icon": "#5fdfdf bold",     # cyan brand color
+            "software": f"{accent_hex} bold",
+            "bracket": "#585858",
+            "context": "#bcbcbc",
+            "arrow": "#808080",
+            # Completion menu
+            "completion-menu.completion": "bg:#303030 #bcbcbc",
+            "completion-menu.completion.current": f"bg:{accent_hex} #000000",
+            "completion-menu.meta.completion": "bg:#303030 #808080",
+            "completion-menu.meta.completion.current": f"bg:{accent_hex} #000000",
+            # Auto-suggest
+            "auto-suggest": "#585858",
+            # Bottom toolbar
+            "bottom-toolbar": "bg:#1c1c1c #808080",
+            "bottom-toolbar.text": "#808080",
+        })
+
+    # ── Messages ──────────────────────────────────────────────────────
+
+    def success(self, message: str):
+        """Print a success message with green checkmark."""
+        icon = self._c(_GREEN + _BOLD, "✓")
+        print(f"  {icon} {self._c(_GREEN, message)}")
+
+    def error(self, message: str):
+        """Print an error message with red cross."""
+        icon = self._c(_RED + _BOLD, "✗")
+        print(f"  {icon} {self._c(_RED, message)}", file=sys.stderr)
+
+    def warning(self, message: str):
+        """Print a warning message with yellow triangle."""
+        icon = self._c(_YELLOW + _BOLD, "⚠")
+        print(f"  {icon} {self._c(_YELLOW, message)}")
+
+    def info(self, message: str):
+        """Print an info message with blue dot."""
+        icon = self._c(_BLUE, "●")
+        print(f"  {icon} {self._c(_LIGHT_GRAY, message)}")
+
+    def hint(self, message: str):
+        """Print a subtle hint message."""
+        print(f"  {self._c(_DARK_GRAY, message)}")
+
+    def section(self, title: str):
+        """Print a section header."""
+        print()
+        print(f"  {self._c(self.accent + _BOLD, title)}")
+        print(f"  {self._c(_DARK_GRAY, _H_LINE * len(title))}")
+
+    # ── Status display ────────────────────────────────────────────────
+
+    def status(self, label: str, value: str):
+        """Print a key-value status line."""
+        lbl = self._c(_GRAY, f"  {label}:")
+        val = self._c(_WHITE, f" {value}")
+        print(f"{lbl}{val}")
+
+    def status_block(self, items: dict[str, str], title: str = ""):
+        """Print a block of status key-value pairs.
+
+        Args:
+            items: Dict of label -> value pairs.
+            title: Optional title for the block.
+        """
+        if title:
+            self.section(title)
+
+        max_key = max(len(k) for k in items) if items else 0
+        for label, value in items.items():
+            lbl = self._c(_GRAY, f"  {label:<{max_key}}")
+            val = self._c(_WHITE, f"  {value}")
+            print(f"{lbl}{val}")
+
+    def progress(self, current: int, total: int, label: str = ""):
+        """Print a simple progress indicator.
+
+        Args:
+            current: Current step number.
+            total: Total number of steps.
+            label: Optional label for the progress.
+        """
+        pct = int(current / total * 100) if total > 0 else 0
+        bar_width = 20
+        filled = int(bar_width * current / total) if total > 0 else 0
+        bar = "█" * filled + "░" * (bar_width - filled)
+        text = f"  {self._c(_CYAN, bar)} {self._c(_GRAY, f'{pct:3d}%')}"
+        if label:
+            text += f" {self._c(_LIGHT_GRAY, label)}"
+        print(text)
+
+    # ── Table display ─────────────────────────────────────────────────
+
+    def table(self, headers: list[str], rows: list[list[str]],
+              max_col_width: int = 40):
+        """Print a formatted table with box-drawing characters.
+
+        Args:
+            headers: Column header strings.
+            rows: List of rows, each a list of cell strings.
+            max_col_width: Maximum column width before truncation.
+        """
+        if not headers:
+            return
+
+        # Calculate column widths
+        col_widths = [min(len(h), max_col_width) for h in headers]
+        for row in rows:
+            for i, cell in enumerate(row):
+                if i < len(col_widths):
+                    col_widths[i] = min(
+                        max(col_widths[i], len(str(cell))), max_col_width
+                    )
+
+        def pad(text: str, width: int) -> str:
+            t = str(text)[:width]
+            return t + " " * (width - len(t))
+
+        # Header
+        header_cells = [
+            self._c(_CYAN + _BOLD, pad(h, col_widths[i]))
+            for i, h in enumerate(headers)
+        ]
+        sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
+        header_line = f"  {sep.join(header_cells)}"
+        print(header_line)
+
+        # Separator
+        sep_parts = [self._c(_DARK_GRAY, _H_LINE * w) for w in col_widths]
+        sep_line = self._c(_DARK_GRAY, f"  {'───'.join([_H_LINE * w for w in col_widths])}")
+        print(sep_line)
+
+        # Rows
+        for row in rows:
+            cells = []
+            for i, cell in enumerate(row):
+                if i < len(col_widths):
+                    cells.append(self._c(_LIGHT_GRAY, pad(str(cell), col_widths[i])))
+            row_sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
+            print(f"  {row_sep.join(cells)}")
+
+    # ── Help display ──────────────────────────────────────────────────
+
+    def help(self, commands: dict[str, str]):
+        """Print a formatted help listing.
+
+        Args:
+            commands: Dict of command -> description pairs.
+        """
+        self.section("Commands")
+        max_cmd = max(len(c) for c in commands) if commands else 0
+        for cmd, desc in commands.items():
+            cmd_styled = self._c(self.accent, f"  {cmd:<{max_cmd}}")
+            desc_styled = self._c(_GRAY, f"  {desc}")
+            print(f"{cmd_styled}{desc_styled}")
+        print()
+
+    # ── Goodbye ───────────────────────────────────────────────────────
+
+    def print_goodbye(self):
+        """Print a styled goodbye message."""
+        print(f"\n  {_ICON_SMALL} {self._c(_GRAY, 'Goodbye!')}\n")
+
+    # ── Prompt toolkit session factory ────────────────────────────────
+
+    def create_prompt_session(self):
+        """Create a prompt_toolkit PromptSession with skin styling.
+
+        Returns:
+            A configured PromptSession, or None if prompt_toolkit unavailable.
+        """
+        try:
+            from prompt_toolkit import PromptSession
+            from prompt_toolkit.history import FileHistory
+            from prompt_toolkit.auto_suggest import AutoSuggestFromHistory
+            from prompt_toolkit.formatted_text import FormattedText
+
+            style = self.get_prompt_style()
+
+            session = PromptSession(
+                history=FileHistory(self.history_file),
+                auto_suggest=AutoSuggestFromHistory(),
+                style=style,
+                enable_history_search=True,
+            )
+            return session
+        except ImportError:
+            return None
+
+    def get_input(self, pt_session, project_name: str = "",
+                  modified: bool = False, context: str = "") -> str:
+        """Get input from user using prompt_toolkit or fallback.
+
+        Args:
+            pt_session: A prompt_toolkit PromptSession (or None).
+            project_name: Current project name.
+            modified: Whether project has unsaved changes.
+            context: Optional context string.
+
+        Returns:
+            User input string (stripped).
+        """
+        if pt_session is not None:
+            from prompt_toolkit.formatted_text import FormattedText
+            tokens = self.prompt_tokens(project_name, modified, context)
+            return pt_session.prompt(FormattedText(tokens)).strip()
+        else:
+            raw_prompt = self.prompt(project_name, modified, context)
+            return input(raw_prompt).strip()
+
+    # ── Toolbar builder ───────────────────────────────────────────────
+
+    def bottom_toolbar(self, items: dict[str, str]):
+        """Create a bottom toolbar callback for prompt_toolkit.
+
+        Args:
+            items: Dict of label -> value pairs to show in toolbar.
+
+        Returns:
+            A callable that returns FormattedText for the toolbar.
+        """
+        def toolbar():
+            from prompt_toolkit.formatted_text import FormattedText
+            parts = []
+            for i, (k, v) in enumerate(items.items()):
+                if i > 0:
+                    parts.append(("class:bottom-toolbar.text", "  │  "))
+                parts.append(("class:bottom-toolbar.text", f" {k}: "))
+                parts.append(("class:bottom-toolbar", v))
+            return FormattedText(parts)
+        return toolbar
+
+
+# ── ANSI 256-color to hex mapping (for prompt_toolkit styles) ─────────
+
+_ANSI_256_TO_HEX = {
+    "\033[38;5;33m":  "#0087ff",  # audacity navy blue
+    "\033[38;5;35m":  "#00af5f",  # shotcut teal
+    "\033[38;5;39m":  "#00afff",  # inkscape bright blue
+    "\033[38;5;40m":  "#00d700",  # libreoffice green
+    "\033[38;5;55m":  "#5f00af",  # obs purple
+    "\033[38;5;69m":  "#5f87ff",  # kdenlive slate blue
+    "\033[38;5;75m":  "#5fafff",  # default sky blue
+    "\033[38;5;80m":  "#5fd7d7",  # brand cyan
+    "\033[38;5;208m": "#ff8700",  # blender deep orange
+    "\033[38;5;214m": "#ffaf00",  # gimp warm orange
+    "\033[38;5;255m": "#eeeeee",  # ollama white
+}
--- a/ollama/agent-harness/setup.py
+++ b/ollama/agent-harness/setup.py
@@ -0,0 +1,57 @@
+#!/usr/bin/env python3
+"""
+setup.py for cli-anything-ollama
+
+Install with: pip install -e .
+Or publish to PyPI: python -m build && twine upload dist/*
+"""
+
+from setuptools import setup, find_namespace_packages
+
+with open("cli_anything/ollama/README.md", "r", encoding="utf-8") as fh:
+    long_description = fh.read()
+
+setup(
+    name="cli-anything-ollama",
+    version="1.0.1",
+    author="cli-anything contributors",
+    author_email="",
+    description="CLI harness for Ollama - Local LLM inference and model management via Ollama REST API. Recommended: Ollama running at http://localhost:11434",
+    long_description=long_description,
+    long_description_content_type="text/markdown",
+    url="https://github.com/HKUDS/CLI-Anything",
+    packages=find_namespace_packages(include=["cli_anything.*"]),
+    classifiers=[
+        "Development Status :: 4 - Beta",
+        "Intended Audience :: Developers",
+        "Topic :: Software Development :: Libraries :: Python Modules",
+        "Topic :: Scientific/Engineering :: Artificial Intelligence",
+        "License :: OSI Approved :: MIT License",
+        "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.10",
+        "Programming Language :: Python :: 3.11",
+        "Programming Language :: Python :: 3.12",
+    ],
+    python_requires=">=3.10",
+    install_requires=[
+        "click>=8.0.0",
+        "prompt-toolkit>=3.0.0",
+        "requests>=2.28.0",
+    ],
+    extras_require={
+        "dev": [
+            "pytest>=7.0.0",
+            "pytest-cov>=4.0.0",
+        ],
+    },
+    entry_points={
+        "console_scripts": [
+            "cli-anything-ollama=cli_anything.ollama.ollama_cli:main",
+        ],
+    },
+    package_data={
+        "cli_anything.ollama": ["skills/*.md"],
+    },
+    include_package_data=True,
+    zip_safe=False,
+)
--- a/registry.json
+++ b/registry.json
@@ -187,6 +187,18 @@
      "contributor": "Haimbeau1o",
      "contributor_url": "https://github.com/Haimbeau1o"
    },
+    {
+      "name": "ollama",
+      "display_name": "Ollama",
+      "version": "1.0.1",
+      "description": "Local LLM inference and model management via Ollama REST API",
+      "requires": "Ollama running at http://localhost:11434",
+      "homepage": "https://ollama.com",
+      "install_cmd": "pip install git+https://github.com/HKUDS/CLI-Anything.git#subdirectory=ollama/agent-harness",
+      "entry_point": "cli-anything-ollama",
+      "skill_md": "ollama/agent-harness/cli_anything/ollama/skills/SKILL.md",
+      "category": "ai"
+    },
    {
      "name": "obs-studio",
      "display_name": "OBS Studio",
				`@@ -0,0 +1 @@`
				`"""Ollama CLI - Local LLM inference and model management."""`