feat: Add VideoCaptioner CLI harness — AI-powered video captioning

Add agent-harness for VideoCaptioner, an AI-powered video captioning tool. Pipeline: Speech transcription → Subtitle optimization → Translation → Video synthesis with styled subtitles. Key features: - 4 ASR engines (bijian/jianying free, whisper-api, whisper-cpp) - 3 translation services (LLM, Bing free, Google free), 38 languages - Beautiful subtitle styles (ASS outline + rounded background) - Full pipeline in one command - 26 tests (14 unit + 12 e2e), all passing
2026-04-20 21:00:28 +08:00 · 2026-03-29 21:09:01 +08:00
parent 790a186a4c
commit 333ae5986f
18 changed files with 1743 additions and 0 deletions
--- a/videocaptioner/agent-harness/VIDEOCAPTIONER.md
+++ b/videocaptioner/agent-harness/VIDEOCAPTIONER.md
@@ -0,0 +1,83 @@
+# VideoCaptioner: Project-Specific Analysis & SOP
+
+## Architecture Summary
+
+VideoCaptioner is an AI-powered video captioning tool that provides a complete
+pipeline from speech recognition to styled subtitle synthesis. It ships as a
+standalone CLI (`pip install videocaptioner`) with a well-defined command interface.
+
+```
+----------------------------------------------------------+
+|                   VideoCaptioner CLI                      |
+|  +------------+ +----------+ +-----------+ +-----------+ |
+|  | Transcribe | | Subtitle | | Synthesize| |  Process  | |
+|  | (ASR)      | | (NLP)    | | (FFmpeg)  | | (Pipeline)| |
+|  +-----+------+ +----+-----+ +-----+-----+ +-----+-----+ |
+|        |              |             |             |        |
+|  +-----+--------------+-------------+-------------+-----+ |
+|  |                    Core Engine                       | |
+|  |  ASR engines, LLM optimization, Translation,        | |
+|  |  Subtitle rendering (ASS + Rounded), FFmpeg          | |
+|  +-----------------------------------------------------+ |
+----------------------------------------------------------+
+```
+
+## CLI Strategy: Subprocess Wrapper
+
+Unlike applications that need reverse-engineering of internal formats,
+VideoCaptioner already provides a production CLI. Our harness:
+
+1. **Click wrapper** provides the CLI-Anything standard interface
+2. **Subprocess backend** delegates to `videocaptioner` CLI commands
+3. **JSON mode** (`--json`) returns structured output for agents
+4. **REPL mode** provides interactive session with tab-completion
+
+### Why Subprocess?
+
+VideoCaptioner's CLI is:
+- **Production-tested** with 50+ unit tests and 200+ QA test cases
+- **Feature-complete** with 7 subcommands covering the full pipeline
+- **Well-documented** with clear `--help` text and exit codes
+- **Actively maintained** on PyPI with automated releases
+
+Wrapping via subprocess preserves all these qualities without reimplementation.
+
+## Coverage
+
+### Transcription (4 ASR engines)
+- `bijian` — Free, Chinese & English, no setup needed
+- `jianying` — Free, Chinese & English, no setup needed
+- `whisper-api` — All languages, OpenAI-compatible API
+- `whisper-cpp` — All languages, local model
+
+### Subtitle Processing
+- **Split** — Semantic re-segmentation via LLM
+- **Optimize** — Fix ASR errors, punctuation, formatting via LLM
+- **Translate** — 38 languages, 3 translators (LLM, Bing free, Google free)
+- **Layout** — target-above, source-above, target-only, source-only
+
+### Video Synthesis
+- **Soft subtitles** — Embedded subtitle track (switchable)
+- **Hard subtitles** — Burned into video frames
+- **ASS style** — Traditional outline/shadow with presets (default, anime, vertical)
+- **Rounded style** — Modern rounded background boxes
+- **Customizable** — Inline JSON override for any style parameter
+- **Quality levels** — ultra (CRF 18), high (CRF 23), medium (CRF 28), low (CRF 32)
+
+### Utilities
+- Configuration management (TOML config + env vars)
+- Style preset listing with full parameters
+- Online video download (YouTube, Bilibili, etc.)
+
+## Testing Strategy
+
+- **Unit tests**: Mock subprocess calls, verify argument construction
+- **End-to-end tests**: Real videocaptioner CLI with test media files
+- **Prerequisite**: `videocaptioner` and `ffmpeg` must be installed
+
+## Limitations
+
+- Requires `videocaptioner` package to be installed separately
+- Free ASR engines (bijian/jianying) only support Chinese & English
+- LLM features require an OpenAI-compatible API key
+- Hard subtitle styles require FFmpeg
--- a/videocaptioner/agent-harness/cli_anything/init.py
+++ b/videocaptioner/agent-harness/cli_anything/init.py
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/README.md
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/README.md
@@ -0,0 +1,71 @@
+# VideoCaptioner CLI
+
+AI-powered video captioning tool with beautiful customizable subtitle styles.
+
+## Architecture
+
+- **Subprocess backend** delegates to the production `videocaptioner` CLI (`pip install videocaptioner`)
+- **Click** provides the CLI framework with subcommand groups and REPL
+- **JSON output mode** (`--json`) for agent consumption
+- **Free features included**: bijian ASR (Chinese/English), Bing/Google translation
+
+## Pipeline
+
+```
+Audio/Video → ASR Transcription → Subtitle Splitting → LLM Optimization → Translation → Video Synthesis
+                  (bijian/whisper)      (semantic)         (fix errors)      (38 languages)  (styled subtitles)
+```
+
+## Install
+
+```bash
+pip install videocaptioner click prompt-toolkit
+```
+
+## Run
+
+```bash
+# One-shot: transcribe a Chinese video and add English subtitles
+cli-anything-videocaptioner process video.mp4 --asr bijian --translator bing --target-language en --subtitle-mode hard
+
+# Transcribe only
+cli-anything-videocaptioner transcribe video.mp4 --asr bijian -o output.srt
+
+# Translate existing subtitles
+cli-anything-videocaptioner subtitle input.srt --translator google --target-language ja
+
+# Burn subtitles with anime style
+cli-anything-videocaptioner synthesize video.mp4 -s sub.srt --subtitle-mode hard --style anime
+
+# Custom style (red outline, large font)
+cli-anything-videocaptioner synthesize video.mp4 -s sub.srt --subtitle-mode hard \
+  --style-override '{"outline_color": "#ff0000", "font_size": 48}'
+
+# JSON output mode (for agent consumption)
+cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
+
+# Interactive REPL
+cli-anything-videocaptioner
+```
+
+## Subtitle Styles
+
+Two rendering modes for beautiful subtitles:
+
+**ASS mode** — traditional outline/shadow:
+- Presets: `default` (white+black), `anime` (warm+orange), `vertical` (portrait videos)
+
+**Rounded mode** — modern rounded background boxes:
+- Preset: `rounded` (dark text on semi-transparent background)
+
+Fully customizable via `--style-override` with inline JSON.
+
+## Coverage
+
+| Feature | Commands |
+|---------|----------|
+| Transcription | 4 ASR engines, auto language detection, word timestamps |
+| Subtitle Processing | Split + optimize + translate, 3 translators, 38 languages |
+| Video Synthesis | Soft/hard subtitles, 4 quality levels, 5 style presets |
+| Styles | ASS outline + rounded background, inline JSON customization |
+| Utilities | Config management, style listing, video download |
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/init.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/init.py
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/core/init.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/core/init.py
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/core/pipeline.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/core/pipeline.py
@@ -0,0 +1,91 @@
+"""Full pipeline — transcribe → optimize → translate → synthesize in one command."""
+
+from cli_anything.videocaptioner.utils.vc_backend import run_quiet
+
+
+def process(
+    input_path: str,
+    output_path: str | None = None,
+    asr: str = "bijian",
+    language: str = "auto",
+    translator: str | None = None,
+    target_language: str | None = None,
+    subtitle_mode: str = "soft",
+    quality: str = "medium",
+    layout: str | None = None,
+    style: str | None = None,
+    style_override: str | None = None,
+    render_mode: str | None = None,
+    no_optimize: bool = False,
+    no_translate: bool = False,
+    no_split: bool = False,
+    no_synthesize: bool = False,
+    reflect: bool = False,
+    prompt: str | None = None,
+    api_key: str | None = None,
+    api_base: str | None = None,
+    model: str | None = None,
+) -> str:
+    """Run the complete captioning pipeline.
+
+    Args:
+        input_path: Video or audio file path.
+        output_path: Output file or directory path.
+        asr: ASR engine.
+        language: Source language.
+        translator: Translation service.
+        target_language: Target language.
+        subtitle_mode: soft or hard.
+        quality: Video quality.
+        layout: Bilingual layout.
+        style: Style preset name.
+        style_override: Inline JSON style override.
+        render_mode: ass or rounded.
+        no_optimize: Skip optimization.
+        no_translate: Skip translation.
+        no_split: Skip re-segmentation.
+        no_synthesize: Skip video synthesis.
+        reflect: Reflective translation.
+        prompt: Custom LLM prompt.
+        api_key: LLM API key.
+        api_base: LLM API base URL.
+        model: LLM model name.
+
+    Returns:
+        Output file path.
+    """
+    args = ["process", input_path, "--asr", asr, "--language", language,
+            "--subtitle-mode", subtitle_mode, "--quality", quality]
+    if output_path:
+        args += ["-o", output_path]
+    if translator:
+        args += ["--translator", translator]
+    if target_language:
+        args += ["--target-language", target_language]
+    if layout:
+        args += ["--layout", layout]
+    if style:
+        args += ["--style", style]
+    if style_override:
+        args += ["--style-override", style_override]
+    if render_mode:
+        args += ["--render-mode", render_mode]
+    if no_optimize:
+        args.append("--no-optimize")
+    if no_translate:
+        args.append("--no-translate")
+    if no_split:
+        args.append("--no-split")
+    if no_synthesize:
+        args.append("--no-synthesize")
+    if reflect:
+        args.append("--reflect")
+    if prompt:
+        args += ["--prompt", prompt]
+    if api_key:
+        args += ["--api-key", api_key]
+    if api_base:
+        args += ["--api-base", api_base]
+    if model:
+        args += ["--model", model]
+    return run_quiet(args)
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/core/subtitle.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/core/subtitle.py
@@ -0,0 +1,68 @@
+"""Subtitle processing — optimize and translate subtitle files."""
+
+from cli_anything.videocaptioner.utils.vc_backend import run_quiet
+
+
+def process_subtitle(
+    input_path: str,
+    output_path: str | None = None,
+    translator: str | None = None,
+    target_language: str | None = None,
+    format: str = "srt",
+    layout: str | None = None,
+    no_optimize: bool = False,
+    no_translate: bool = False,
+    no_split: bool = False,
+    reflect: bool = False,
+    prompt: str | None = None,
+    api_key: str | None = None,
+    api_base: str | None = None,
+    model: str | None = None,
+) -> str:
+    """Optimize and/or translate a subtitle file.
+
+    Args:
+        input_path: Subtitle file (.srt, .ass, .vtt).
+        output_path: Output file or directory path.
+        translator: Translation service (llm, bing, google).
+        target_language: Target language BCP 47 code.
+        format: Output format (srt, ass, txt, json).
+        layout: Bilingual layout (target-above, source-above, target-only, source-only).
+        no_optimize: Skip LLM optimization.
+        no_translate: Skip translation.
+        no_split: Skip re-segmentation.
+        reflect: Enable reflective translation (LLM only).
+        prompt: Custom LLM prompt.
+        api_key: LLM API key.
+        api_base: LLM API base URL.
+        model: LLM model name.
+
+    Returns:
+        Output file path.
+    """
+    args = ["subtitle", input_path, "--format", format]
+    if output_path:
+        args += ["-o", output_path]
+    if translator:
+        args += ["--translator", translator]
+    if target_language:
+        args += ["--target-language", target_language]
+    if layout:
+        args += ["--layout", layout]
+    if no_optimize:
+        args.append("--no-optimize")
+    if no_translate:
+        args.append("--no-translate")
+    if no_split:
+        args.append("--no-split")
+    if reflect:
+        args.append("--reflect")
+    if prompt:
+        args += ["--prompt", prompt]
+    if api_key:
+        args += ["--api-key", api_key]
+    if api_base:
+        args += ["--api-base", api_base]
+    if model:
+        args += ["--model", model]
+    return run_quiet(args)
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/core/synthesize.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/core/synthesize.py
@@ -0,0 +1,49 @@
+"""Video synthesis — burn subtitles into video with customizable styles."""
+
+from cli_anything.videocaptioner.utils.vc_backend import run_quiet
+
+
+def synthesize(
+    video_path: str,
+    subtitle_path: str,
+    output_path: str | None = None,
+    subtitle_mode: str = "soft",
+    quality: str = "medium",
+    layout: str | None = None,
+    render_mode: str | None = None,
+    style: str | None = None,
+    style_override: str | None = None,
+    font_file: str | None = None,
+) -> str:
+    """Burn subtitles into a video file.
+
+    Args:
+        video_path: Input video file.
+        subtitle_path: Subtitle file (.srt, .ass).
+        output_path: Output video file path.
+        subtitle_mode: 'soft' (embedded track) or 'hard' (burned in).
+        quality: Video quality (ultra, high, medium, low).
+        layout: Bilingual layout.
+        render_mode: 'ass' (outline/shadow) or 'rounded' (background boxes).
+        style: Style preset name (default, anime, vertical, rounded).
+        style_override: Inline JSON to override style fields.
+        font_file: Custom font file path (.ttf/.otf).
+
+    Returns:
+        Output video file path.
+    """
+    args = ["synthesize", video_path, "-s", subtitle_path,
+            "--subtitle-mode", subtitle_mode, "--quality", quality]
+    if output_path:
+        args += ["-o", output_path]
+    if layout:
+        args += ["--layout", layout]
+    if render_mode:
+        args += ["--render-mode", render_mode]
+    if style:
+        args += ["--style", style]
+    if style_override:
+        args += ["--style-override", style_override]
+    if font_file:
+        args += ["--font-file", font_file]
+    return run_quiet(args)
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/core/transcribe.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/core/transcribe.py
@@ -0,0 +1,44 @@
+"""Transcription — speech to subtitles via ASR engines."""
+
+from cli_anything.videocaptioner.utils.vc_backend import run_quiet
+
+
+def transcribe(
+    input_path: str,
+    output_path: str | None = None,
+    asr: str = "bijian",
+    language: str = "auto",
+    format: str = "srt",
+    word_timestamps: bool = False,
+    whisper_api_key: str | None = None,
+    whisper_api_base: str | None = None,
+    whisper_model: str | None = None,
+) -> str:
+    """Transcribe audio/video to subtitles.
+
+    Args:
+        input_path: Audio or video file path.
+        output_path: Output file or directory path.
+        asr: ASR engine (bijian, jianying, whisper-api, whisper-cpp).
+        language: Source language ISO 639-1 code, or 'auto'.
+        format: Output format (srt, ass, txt, json).
+        word_timestamps: Include word-level timestamps.
+        whisper_api_key: Whisper API key (for whisper-api engine).
+        whisper_api_base: Whisper API base URL.
+        whisper_model: Whisper model name.
+
+    Returns:
+        Output file path.
+    """
+    args = ["transcribe", input_path, "--asr", asr, "--language", language, "--format", format]
+    if output_path:
+        args += ["-o", output_path]
+    if word_timestamps:
+        args.append("--word-timestamps")
+    if whisper_api_key:
+        args += ["--whisper-api-key", whisper_api_key]
+    if whisper_api_base:
+        args += ["--whisper-api-base", whisper_api_base]
+    if whisper_model:
+        args += ["--whisper-model", whisper_model]
+    return run_quiet(args)
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/skills/SKILL.md
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/skills/SKILL.md
@@ -0,0 +1,123 @@
+---
+name: >-
+  cli-anything-videocaptioner
+description: >-
+  AI-powered video captioning — transcribe speech, optimize/translate subtitles, burn into video with beautiful customizable styles (ASS outline or rounded background). Free ASR and translation included.
+---
+
+# cli-anything-videocaptioner
+
+AI-powered video captioning tool. Transcribe speech → optimize subtitles → translate → burn into video with beautiful styles.
+
+## Installation
+
+```bash
+pip install cli-anything-videocaptioner
+```
+
+**Prerequisites:**
+- Python 3.10+
+- `videocaptioner` must be installed (`pip install videocaptioner`)
+- FFmpeg required for video synthesis
+
+## Usage
+
+### Basic Commands
+
+```bash
+# Show help
+cli-anything-videocaptioner --help
+
+# Start interactive REPL mode
+cli-anything-videocaptioner
+
+# Transcribe a video (free, no setup)
+cli-anything-videocaptioner transcribe video.mp4 --asr bijian
+
+# Translate subtitles (free Bing translator)
+cli-anything-videocaptioner subtitle input.srt --translator bing --target-language en
+
+# Full pipeline: transcribe → translate → burn subtitles
+cli-anything-videocaptioner process video.mp4 --asr bijian --translator bing --target-language en --subtitle-mode hard
+
+# JSON output (for agent consumption)
+cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
+```
+
+### REPL Mode
+
+When invoked without a subcommand, the CLI enters an interactive REPL session:
+
+```bash
+cli-anything-videocaptioner
+# Enter commands interactively with tab-completion and history
+```
+
+## Command Groups
+
+### transcribe — Speech to subtitles
+```
+transcribe <input> [--asr bijian|jianying|whisper-api|whisper-cpp] [--language CODE] [--format srt|ass|txt|json] [-o PATH]
+```
+- `bijian` (default): Free, Chinese & English, no setup
+- `whisper-api`: All languages, requires `--whisper-api-key`
+
+### subtitle — Optimize and translate
+```
+subtitle <input.srt> [--translator llm|bing|google] [--target-language CODE] [--layout target-above|source-above|target-only|source-only] [--no-optimize] [--no-translate] [-o PATH]
+```
+- Three steps: Split → Optimize → Translate
+- Bing/Google translators are free
+- 38 target languages supported (BCP 47 codes)
+
+### synthesize — Burn subtitles into video
+```
+synthesize <video> -s <subtitle> [--subtitle-mode soft|hard] [--quality ultra|high|medium|low] [--style NAME] [--style-override JSON] [--render-mode ass|rounded] [--font-file PATH] [-o PATH]
+```
+- **ASS mode**: Outline/shadow style with presets (default, anime, vertical)
+- **Rounded mode**: Modern rounded background boxes
+- Customizable via `--style-override '{"outline_color": "#ff0000"}'`
+
+### process — Full pipeline
+```
+process <input> [--asr ...] [--translator ...] [--target-language ...] [--subtitle-mode ...] [--style ...] [--no-optimize] [--no-translate] [--no-synthesize] [-o PATH]
+```
+
+### styles — List style presets
+```
+styles
+```
+
+### config — Manage settings
+```
+config show
+config set <key> <value>
+```
+
+### download — Download online video
+```
+download <URL> [-o DIR]
+```
+
+## JSON Output
+
+All commands support `--json` for machine-readable output:
+```bash
+cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
+# {"output_path": "/path/to/output.srt"}
+```
+
+## Style Presets
+
+| Name | Mode | Description |
+|------|------|-------------|
+| `default` | ASS | White text, black outline — clean and universal |
+| `anime` | ASS | Warm white, orange outline — anime/cartoon style |
+| `vertical` | ASS | High bottom margin — for portrait/vertical videos |
+| `rounded` | Rounded | Dark text on semi-transparent rounded background |
+
+Customize any field: `--style-override '{"font_size": 48, "outline_color": "#ff0000"}'`
+
+## Target Languages
+
+BCP 47 codes: `zh-Hans` `zh-Hant` `en` `ja` `ko` `fr` `de` `es` `ru` `pt` `it` `ar` `th` `vi` `id` and 23 more.
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/tests/init.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/tests/init.py
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/tests/test_core.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/tests/test_core.py
@@ -0,0 +1,110 @@
+"""Unit tests for VideoCaptioner CLI harness core modules."""
+
+import pytest
+from unittest.mock import patch, MagicMock
+
+
+class TestTranscribe:
+    @patch("cli_anything.videocaptioner.core.transcribe.run_quiet", return_value="/tmp/o.srt")
+    def test_basic(self, mock_run):
+        from cli_anything.videocaptioner.core.transcribe import transcribe
+        assert transcribe("video.mp4") == "/tmp/o.srt"
+        assert "transcribe" in mock_run.call_args[0][0]
+        assert "bijian" in mock_run.call_args[0][0]
+
+    @patch("cli_anything.videocaptioner.core.transcribe.run_quiet", return_value="/tmp/o.json")
+    def test_options(self, mock_run):
+        from cli_anything.videocaptioner.core.transcribe import transcribe
+        transcribe("v.mp4", asr="whisper-api", language="fr", format="json",
+                   output_path="/tmp/o.json", whisper_api_key="sk-xxx")
+        a = mock_run.call_args[0][0]
+        assert "whisper-api" in a and "fr" in a and "json" in a and "sk-xxx" in a
+
+    @patch("cli_anything.videocaptioner.core.transcribe.run_quiet", return_value="/tmp/o.srt")
+    def test_word_timestamps(self, mock_run):
+        from cli_anything.videocaptioner.core.transcribe import transcribe
+        transcribe("v.mp4", word_timestamps=True)
+        assert "--word-timestamps" in mock_run.call_args[0][0]
+
+
+class TestSubtitle:
+    @patch("cli_anything.videocaptioner.core.subtitle.run_quiet", return_value="/tmp/o.srt")
+    def test_translate(self, mock_run):
+        from cli_anything.videocaptioner.core.subtitle import process_subtitle
+        process_subtitle("in.srt", translator="bing", target_language="en")
+        a = mock_run.call_args[0][0]
+        assert "bing" in a and "en" in a
+
+    @patch("cli_anything.videocaptioner.core.subtitle.run_quiet", return_value="/tmp/o.srt")
+    def test_skip(self, mock_run):
+        from cli_anything.videocaptioner.core.subtitle import process_subtitle
+        process_subtitle("in.srt", no_optimize=True, no_translate=True)
+        a = mock_run.call_args[0][0]
+        assert "--no-optimize" in a and "--no-translate" in a
+
+    @patch("cli_anything.videocaptioner.core.subtitle.run_quiet", return_value="/tmp/o.srt")
+    def test_llm(self, mock_run):
+        from cli_anything.videocaptioner.core.subtitle import process_subtitle
+        process_subtitle("in.srt", translator="llm", target_language="ja",
+                        reflect=True, api_key="sk-xxx", layout="target-above")
+        a = mock_run.call_args[0][0]
+        assert "--reflect" in a and "sk-xxx" in a and "target-above" in a
+
+
+class TestSynthesize:
+    @patch("cli_anything.videocaptioner.core.synthesize.run_quiet", return_value="/tmp/o.mp4")
+    def test_soft(self, mock_run):
+        from cli_anything.videocaptioner.core.synthesize import synthesize
+        synthesize("v.mp4", "s.srt")
+        assert "soft" in mock_run.call_args[0][0]
+
+    @patch("cli_anything.videocaptioner.core.synthesize.run_quiet", return_value="/tmp/o.mp4")
+    def test_hard_style(self, mock_run):
+        from cli_anything.videocaptioner.core.synthesize import synthesize
+        synthesize("v.mp4", "s.srt", subtitle_mode="hard", style="anime", quality="high")
+        a = mock_run.call_args[0][0]
+        assert "hard" in a and "anime" in a and "high" in a
+
+    @patch("cli_anything.videocaptioner.core.synthesize.run_quiet", return_value="/tmp/o.mp4")
+    def test_rounded(self, mock_run):
+        from cli_anything.videocaptioner.core.synthesize import synthesize
+        synthesize("v.mp4", "s.srt", subtitle_mode="hard", render_mode="rounded",
+                  style_override='{"bg_color":"#000000cc"}')
+        a = mock_run.call_args[0][0]
+        assert "rounded" in a and "#000000cc" in str(a)
+
+
+class TestPipeline:
+    @patch("cli_anything.videocaptioner.core.pipeline.run_quiet", return_value="/tmp/o.mp4")
+    def test_full(self, mock_run):
+        from cli_anything.videocaptioner.core.pipeline import process
+        process("v.mp4", translator="bing", target_language="en", style="anime")
+        a = mock_run.call_args[0][0]
+        assert "process" in a and "bing" in a and "anime" in a
+
+    @patch("cli_anything.videocaptioner.core.pipeline.run_quiet", return_value="/tmp/o.srt")
+    def test_no_synth(self, mock_run):
+        from cli_anything.videocaptioner.core.pipeline import process
+        process("v.mp4", no_synthesize=True)
+        assert "--no-synthesize" in mock_run.call_args[0][0]
+
+
+class TestBackend:
+    @patch("subprocess.run")
+    def test_success(self, mock_sub):
+        mock_sub.return_value = MagicMock(returncode=0, stdout="/tmp/o.srt\n", stderr="")
+        from cli_anything.videocaptioner.utils.vc_backend import run_quiet
+        assert run_quiet(["transcribe", "v.mp4"]) == "/tmp/o.srt"
+
+    @patch("subprocess.run")
+    def test_failure(self, mock_sub):
+        mock_sub.return_value = MagicMock(returncode=5, stdout="", stderr="Error: fail")
+        from cli_anything.videocaptioner.utils.vc_backend import run_quiet
+        with pytest.raises(RuntimeError, match="fail"):
+            run_quiet(["transcribe", "x.mp4"])
+
+    @patch("shutil.which", return_value=None)
+    def test_not_installed(self, _):
+        from cli_anything.videocaptioner.utils.vc_backend import _find_vc
+        with pytest.raises(RuntimeError, match="not found"):
+            _find_vc()
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/tests/test_full_e2e.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/tests/test_full_e2e.py
@@ -0,0 +1,104 @@
+"""End-to-end tests for VideoCaptioner CLI harness.
+
+These tests require videocaptioner to be installed.
+Skip with: pytest -m "not e2e"
+"""
+
+import pytest
+import subprocess
+import shutil
+
+# Skip all tests if videocaptioner is not installed
+pytestmark = pytest.mark.skipif(
+    shutil.which("videocaptioner") is None,
+    reason="videocaptioner not installed"
+)
+
+
+class TestCLIEntryPoint:
+    def test_help(self):
+        result = subprocess.run(
+            ["videocaptioner", "--help"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 0
+        assert "transcribe" in result.stdout
+        assert "subtitle" in result.stdout
+        assert "synthesize" in result.stdout
+
+    def test_version(self):
+        result = subprocess.run(
+            ["videocaptioner", "--version"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 0
+        assert "videocaptioner" in result.stdout
+
+    def test_style_list(self):
+        result = subprocess.run(
+            ["videocaptioner", "style"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 0
+        assert "default" in result.stdout
+        assert "anime" in result.stdout or "rounded" in result.stdout
+
+    def test_config_show(self):
+        result = subprocess.run(
+            ["videocaptioner", "config", "show"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 0
+
+    def test_transcribe_missing_file(self):
+        result = subprocess.run(
+            ["videocaptioner", "transcribe", "nonexistent.mp4", "--asr", "bijian"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 3  # FILE_NOT_FOUND
+
+    def test_subtitle_missing_file(self):
+        result = subprocess.run(
+            ["videocaptioner", "subtitle", "nonexistent.srt"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 3
+
+    def test_synthesize_missing_args(self):
+        result = subprocess.run(
+            ["videocaptioner", "synthesize", "video.mp4"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 2  # USAGE_ERROR (missing -s)
+
+    def test_invalid_asr_engine(self):
+        result = subprocess.run(
+            ["videocaptioner", "transcribe", "video.mp4", "--asr", "invalid"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode == 2
+
+    def test_invalid_target_language(self):
+        result = subprocess.run(
+            ["videocaptioner", "subtitle", "test.srt", "--translator", "bing",
+             "--target-language", "invalid-lang"],
+            capture_output=True, text=True, timeout=10,
+        )
+        assert result.returncode != 0
+
+
+class TestBackendIntegration:
+    def test_get_version(self):
+        from cli_anything.videocaptioner.utils.vc_backend import get_version
+        version = get_version()
+        assert "videocaptioner" in version.lower()
+
+    def test_get_config(self):
+        from cli_anything.videocaptioner.utils.vc_backend import get_config
+        config = get_config()
+        assert "llm" in config or "transcribe" in config
+
+    def test_get_styles(self):
+        from cli_anything.videocaptioner.utils.vc_backend import get_styles
+        styles = get_styles()
+        assert "default" in styles
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/utils/init.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/utils/init.py
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/utils/repl_skin.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/utils/repl_skin.py
@@ -0,0 +1,500 @@
+"""cli-anything REPL Skin — Unified terminal interface for all CLI harnesses.
+
+Copy this file into your CLI package at:
+    cli_anything/<software>/utils/repl_skin.py
+
+Usage:
+    from cli_anything.<software>.utils.repl_skin import ReplSkin
+
+    skin = ReplSkin("ollama", version="1.0.0")
+    skin.print_banner()
+    prompt_text = skin.prompt(project_name="llama3.2", modified=False)
+    skin.success("Model pulled")
+    skin.error("Connection failed")
+    skin.warning("No models loaded")
+    skin.info("Generating...")
+    skin.status("Model", "llama3.2:latest")
+    skin.table(headers, rows)
+    skin.print_goodbye()
+"""
+
+import os
+import sys
+
+# ── ANSI color codes (no external deps for core styling) ──────────────
+
+_RESET = "\033[0m"
+_BOLD = "\033[1m"
+_DIM = "\033[2m"
+_ITALIC = "\033[3m"
+_UNDERLINE = "\033[4m"
+
+# Brand colors
+_CYAN = "\033[38;5;80m"       # cli-anything brand cyan
+_CYAN_BG = "\033[48;5;80m"
+_WHITE = "\033[97m"
+_GRAY = "\033[38;5;245m"
+_DARK_GRAY = "\033[38;5;240m"
+_LIGHT_GRAY = "\033[38;5;250m"
+
+# Software accent colors — each software gets a unique accent
+_ACCENT_COLORS = {
+    "gimp":        "\033[38;5;214m",   # warm orange
+    "blender":     "\033[38;5;208m",   # deep orange
+    "inkscape":    "\033[38;5;39m",    # bright blue
+    "audacity":    "\033[38;5;33m",    # navy blue
+    "libreoffice": "\033[38;5;40m",    # green
+    "obs_studio":  "\033[38;5;55m",    # purple
+    "kdenlive":    "\033[38;5;69m",    # slate blue
+    "shotcut":     "\033[38;5;35m",    # teal green
+    "ollama":      "\033[38;5;255m",   # white (Ollama branding)
+}
+_DEFAULT_ACCENT = "\033[38;5;75m"      # default sky blue
+
+# Status colors
+_GREEN = "\033[38;5;78m"
+_YELLOW = "\033[38;5;220m"
+_RED = "\033[38;5;196m"
+_BLUE = "\033[38;5;75m"
+_MAGENTA = "\033[38;5;176m"
+
+# ── Brand icon ────────────────────────────────────────────────────────
+
+# The cli-anything icon: a small colored diamond/chevron mark
+_ICON = f"{_CYAN}{_BOLD}◆{_RESET}"
+_ICON_SMALL = f"{_CYAN}▸{_RESET}"
+
+# ── Box drawing characters ────────────────────────────────────────────
+
+_H_LINE = "─"
+_V_LINE = "│"
+_TL = "╭"
+_TR = "╮"
+_BL = "╰"
+_BR = "╯"
+_T_DOWN = "┬"
+_T_UP = "┴"
+_T_RIGHT = "├"
+_T_LEFT = "┤"
+_CROSS = "┼"
+
+
+def _strip_ansi(text: str) -> str:
+    """Remove ANSI escape codes for length calculation."""
+    import re
+    return re.sub(r"\033\[[^m]*m", "", text)
+
+
+def _visible_len(text: str) -> int:
+    """Get visible length of text (excluding ANSI codes)."""
+    return len(_strip_ansi(text))
+
+
+class ReplSkin:
+    """Unified REPL skin for cli-anything CLIs.
+
+    Provides consistent branding, prompts, and message formatting
+    across all CLI harnesses built with the cli-anything methodology.
+    """
+
+    def __init__(self, software: str, version: str = "1.0.0",
+                 history_file: str | None = None):
+        """Initialize the REPL skin.
+
+        Args:
+            software: Software name (e.g., "gimp", "shotcut", "ollama").
+            version: CLI version string.
+            history_file: Path for persistent command history.
+                         Defaults to ~/.cli-anything-<software>/history
+        """
+        self.software = software.lower().replace("-", "_")
+        self.display_name = software.replace("_", " ").title()
+        self.version = version
+        self.accent = _ACCENT_COLORS.get(self.software, _DEFAULT_ACCENT)
+
+        # History file
+        if history_file is None:
+            from pathlib import Path
+            hist_dir = Path.home() / f".cli-anything-{self.software}"
+            hist_dir.mkdir(parents=True, exist_ok=True)
+            self.history_file = str(hist_dir / "history")
+        else:
+            self.history_file = history_file
+
+        # Detect terminal capabilities
+        self._color = self._detect_color_support()
+
+    def _detect_color_support(self) -> bool:
+        """Check if terminal supports color."""
+        if os.environ.get("NO_COLOR"):
+            return False
+        if os.environ.get("CLI_ANYTHING_NO_COLOR"):
+            return False
+        if not hasattr(sys.stdout, "isatty"):
+            return False
+        return sys.stdout.isatty()
+
+    def _c(self, code: str, text: str) -> str:
+        """Apply color code if colors are supported."""
+        if not self._color:
+            return text
+        return f"{code}{text}{_RESET}"
+
+    # ── Banner ────────────────────────────────────────────────────────
+
+    def print_banner(self):
+        """Print the startup banner with branding."""
+        inner = 54
+
+        def _box_line(content: str) -> str:
+            """Wrap content in box drawing, padding to inner width."""
+            pad = inner - _visible_len(content)
+            vl = self._c(_DARK_GRAY, _V_LINE)
+            return f"{vl}{content}{' ' * max(0, pad)}{vl}"
+
+        top = self._c(_DARK_GRAY, f"{_TL}{_H_LINE * inner}{_TR}")
+        bot = self._c(_DARK_GRAY, f"{_BL}{_H_LINE * inner}{_BR}")
+
+        # Title:  ◆  cli-anything · Ollama
+        icon = self._c(_CYAN + _BOLD, "◆")
+        brand = self._c(_CYAN + _BOLD, "cli-anything")
+        dot = self._c(_DARK_GRAY, "·")
+        name = self._c(self.accent + _BOLD, self.display_name)
+        title = f" {icon}  {brand} {dot} {name}"
+
+        ver = f" {self._c(_DARK_GRAY, f'   v{self.version}')}"
+        tip = f" {self._c(_DARK_GRAY, '   Type help for commands, quit to exit')}"
+        empty = ""
+
+        print(top)
+        print(_box_line(title))
+        print(_box_line(ver))
+        print(_box_line(empty))
+        print(_box_line(tip))
+        print(bot)
+        print()
+
+    # ── Prompt ────────────────────────────────────────────────────────
+
+    def prompt(self, project_name: str = "", modified: bool = False,
+               context: str = "") -> str:
+        """Build a styled prompt string for prompt_toolkit or input().
+
+        Args:
+            project_name: Current project name (empty if none open).
+            modified: Whether the project has unsaved changes.
+            context: Optional extra context to show in prompt.
+
+        Returns:
+            Formatted prompt string.
+        """
+        parts = []
+
+        # Icon
+        if self._color:
+            parts.append(f"{_CYAN}◆{_RESET} ")
+        else:
+            parts.append("> ")
+
+        # Software name
+        parts.append(self._c(self.accent + _BOLD, self.software))
+
+        # Project context
+        if project_name or context:
+            ctx = context or project_name
+            mod = "*" if modified else ""
+            parts.append(f" {self._c(_DARK_GRAY, '[')}")
+            parts.append(self._c(_LIGHT_GRAY, f"{ctx}{mod}"))
+            parts.append(self._c(_DARK_GRAY, ']'))
+
+        parts.append(self._c(_GRAY, " ❯ "))
+
+        return "".join(parts)
+
+    def prompt_tokens(self, project_name: str = "", modified: bool = False,
+                      context: str = ""):
+        """Build prompt_toolkit formatted text tokens for the prompt.
+
+        Use with prompt_toolkit's FormattedText for proper ANSI handling.
+
+        Returns:
+            list of (style, text) tuples for prompt_toolkit.
+        """
+        accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
+        tokens = []
+
+        tokens.append(("class:icon", "◆ "))
+        tokens.append(("class:software", self.software))
+
+        if project_name or context:
+            ctx = context or project_name
+            mod = "*" if modified else ""
+            tokens.append(("class:bracket", " ["))
+            tokens.append(("class:context", f"{ctx}{mod}"))
+            tokens.append(("class:bracket", "]"))
+
+        tokens.append(("class:arrow", " ❯ "))
+
+        return tokens
+
+    def get_prompt_style(self):
+        """Get a prompt_toolkit Style object matching the skin.
+
+        Returns:
+            prompt_toolkit.styles.Style
+        """
+        try:
+            from prompt_toolkit.styles import Style
+        except ImportError:
+            return None
+
+        accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
+
+        return Style.from_dict({
+            "icon": "#5fdfdf bold",     # cyan brand color
+            "software": f"{accent_hex} bold",
+            "bracket": "#585858",
+            "context": "#bcbcbc",
+            "arrow": "#808080",
+            # Completion menu
+            "completion-menu.completion": "bg:#303030 #bcbcbc",
+            "completion-menu.completion.current": f"bg:{accent_hex} #000000",
+            "completion-menu.meta.completion": "bg:#303030 #808080",
+            "completion-menu.meta.completion.current": f"bg:{accent_hex} #000000",
+            # Auto-suggest
+            "auto-suggest": "#585858",
+            # Bottom toolbar
+            "bottom-toolbar": "bg:#1c1c1c #808080",
+            "bottom-toolbar.text": "#808080",
+        })
+
+    # ── Messages ──────────────────────────────────────────────────────
+
+    def success(self, message: str):
+        """Print a success message with green checkmark."""
+        icon = self._c(_GREEN + _BOLD, "✓")
+        print(f"  {icon} {self._c(_GREEN, message)}")
+
+    def error(self, message: str):
+        """Print an error message with red cross."""
+        icon = self._c(_RED + _BOLD, "✗")
+        print(f"  {icon} {self._c(_RED, message)}", file=sys.stderr)
+
+    def warning(self, message: str):
+        """Print a warning message with yellow triangle."""
+        icon = self._c(_YELLOW + _BOLD, "⚠")
+        print(f"  {icon} {self._c(_YELLOW, message)}")
+
+    def info(self, message: str):
+        """Print an info message with blue dot."""
+        icon = self._c(_BLUE, "●")
+        print(f"  {icon} {self._c(_LIGHT_GRAY, message)}")
+
+    def hint(self, message: str):
+        """Print a subtle hint message."""
+        print(f"  {self._c(_DARK_GRAY, message)}")
+
+    def section(self, title: str):
+        """Print a section header."""
+        print()
+        print(f"  {self._c(self.accent + _BOLD, title)}")
+        print(f"  {self._c(_DARK_GRAY, _H_LINE * len(title))}")
+
+    # ── Status display ────────────────────────────────────────────────
+
+    def status(self, label: str, value: str):
+        """Print a key-value status line."""
+        lbl = self._c(_GRAY, f"  {label}:")
+        val = self._c(_WHITE, f" {value}")
+        print(f"{lbl}{val}")
+
+    def status_block(self, items: dict[str, str], title: str = ""):
+        """Print a block of status key-value pairs.
+
+        Args:
+            items: Dict of label -> value pairs.
+            title: Optional title for the block.
+        """
+        if title:
+            self.section(title)
+
+        max_key = max(len(k) for k in items) if items else 0
+        for label, value in items.items():
+            lbl = self._c(_GRAY, f"  {label:<{max_key}}")
+            val = self._c(_WHITE, f"  {value}")
+            print(f"{lbl}{val}")
+
+    def progress(self, current: int, total: int, label: str = ""):
+        """Print a simple progress indicator.
+
+        Args:
+            current: Current step number.
+            total: Total number of steps.
+            label: Optional label for the progress.
+        """
+        pct = int(current / total * 100) if total > 0 else 0
+        bar_width = 20
+        filled = int(bar_width * current / total) if total > 0 else 0
+        bar = "█" * filled + "░" * (bar_width - filled)
+        text = f"  {self._c(_CYAN, bar)} {self._c(_GRAY, f'{pct:3d}%')}"
+        if label:
+            text += f" {self._c(_LIGHT_GRAY, label)}"
+        print(text)
+
+    # ── Table display ─────────────────────────────────────────────────
+
+    def table(self, headers: list[str], rows: list[list[str]],
+              max_col_width: int = 40):
+        """Print a formatted table with box-drawing characters.
+
+        Args:
+            headers: Column header strings.
+            rows: List of rows, each a list of cell strings.
+            max_col_width: Maximum column width before truncation.
+        """
+        if not headers:
+            return
+
+        # Calculate column widths
+        col_widths = [min(len(h), max_col_width) for h in headers]
+        for row in rows:
+            for i, cell in enumerate(row):
+                if i < len(col_widths):
+                    col_widths[i] = min(
+                        max(col_widths[i], len(str(cell))), max_col_width
+                    )
+
+        def pad(text: str, width: int) -> str:
+            t = str(text)[:width]
+            return t + " " * (width - len(t))
+
+        # Header
+        header_cells = [
+            self._c(_CYAN + _BOLD, pad(h, col_widths[i]))
+            for i, h in enumerate(headers)
+        ]
+        sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
+        header_line = f"  {sep.join(header_cells)}"
+        print(header_line)
+
+        # Separator
+        sep_parts = [self._c(_DARK_GRAY, _H_LINE * w) for w in col_widths]
+        sep_line = self._c(_DARK_GRAY, f"  {'───'.join([_H_LINE * w for w in col_widths])}")
+        print(sep_line)
+
+        # Rows
+        for row in rows:
+            cells = []
+            for i, cell in enumerate(row):
+                if i < len(col_widths):
+                    cells.append(self._c(_LIGHT_GRAY, pad(str(cell), col_widths[i])))
+            row_sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
+            print(f"  {row_sep.join(cells)}")
+
+    # ── Help display ──────────────────────────────────────────────────
+
+    def help(self, commands: dict[str, str]):
+        """Print a formatted help listing.
+
+        Args:
+            commands: Dict of command -> description pairs.
+        """
+        self.section("Commands")
+        max_cmd = max(len(c) for c in commands) if commands else 0
+        for cmd, desc in commands.items():
+            cmd_styled = self._c(self.accent, f"  {cmd:<{max_cmd}}")
+            desc_styled = self._c(_GRAY, f"  {desc}")
+            print(f"{cmd_styled}{desc_styled}")
+        print()
+
+    # ── Goodbye ───────────────────────────────────────────────────────
+
+    def print_goodbye(self):
+        """Print a styled goodbye message."""
+        print(f"\n  {_ICON_SMALL} {self._c(_GRAY, 'Goodbye!')}\n")
+
+    # ── Prompt toolkit session factory ────────────────────────────────
+
+    def create_prompt_session(self):
+        """Create a prompt_toolkit PromptSession with skin styling.
+
+        Returns:
+            A configured PromptSession, or None if prompt_toolkit unavailable.
+        """
+        try:
+            from prompt_toolkit import PromptSession
+            from prompt_toolkit.history import FileHistory
+            from prompt_toolkit.auto_suggest import AutoSuggestFromHistory
+            from prompt_toolkit.formatted_text import FormattedText
+
+            style = self.get_prompt_style()
+
+            session = PromptSession(
+                history=FileHistory(self.history_file),
+                auto_suggest=AutoSuggestFromHistory(),
+                style=style,
+                enable_history_search=True,
+            )
+            return session
+        except ImportError:
+            return None
+
+    def get_input(self, pt_session, project_name: str = "",
+                  modified: bool = False, context: str = "") -> str:
+        """Get input from user using prompt_toolkit or fallback.
+
+        Args:
+            pt_session: A prompt_toolkit PromptSession (or None).
+            project_name: Current project name.
+            modified: Whether project has unsaved changes.
+            context: Optional context string.
+
+        Returns:
+            User input string (stripped).
+        """
+        if pt_session is not None:
+            from prompt_toolkit.formatted_text import FormattedText
+            tokens = self.prompt_tokens(project_name, modified, context)
+            return pt_session.prompt(FormattedText(tokens)).strip()
+        else:
+            raw_prompt = self.prompt(project_name, modified, context)
+            return input(raw_prompt).strip()
+
+    # ── Toolbar builder ───────────────────────────────────────────────
+
+    def bottom_toolbar(self, items: dict[str, str]):
+        """Create a bottom toolbar callback for prompt_toolkit.
+
+        Args:
+            items: Dict of label -> value pairs to show in toolbar.
+
+        Returns:
+            A callable that returns FormattedText for the toolbar.
+        """
+        def toolbar():
+            from prompt_toolkit.formatted_text import FormattedText
+            parts = []
+            for i, (k, v) in enumerate(items.items()):
+                if i > 0:
+                    parts.append(("class:bottom-toolbar.text", "  │  "))
+                parts.append(("class:bottom-toolbar.text", f" {k}: "))
+                parts.append(("class:bottom-toolbar", v))
+            return FormattedText(parts)
+        return toolbar
+
+
+# ── ANSI 256-color to hex mapping (for prompt_toolkit styles) ─────────
+
+_ANSI_256_TO_HEX = {
+    "\033[38;5;33m":  "#0087ff",  # audacity navy blue
+    "\033[38;5;35m":  "#00af5f",  # shotcut teal
+    "\033[38;5;39m":  "#00afff",  # inkscape bright blue
+    "\033[38;5;40m":  "#00d700",  # libreoffice green
+    "\033[38;5;55m":  "#5f00af",  # obs purple
+    "\033[38;5;69m":  "#5f87ff",  # kdenlive slate blue
+    "\033[38;5;75m":  "#5fafff",  # default sky blue
+    "\033[38;5;80m":  "#5fd7d7",  # brand cyan
+    "\033[38;5;208m": "#ff8700",  # blender deep orange
+    "\033[38;5;214m": "#ffaf00",  # gimp warm orange
+    "\033[38;5;255m": "#eeeeee",  # ollama white
+}
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/utils/vc_backend.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/utils/vc_backend.py
@@ -0,0 +1,87 @@
+"""VideoCaptioner CLI backend — subprocess wrapper for the videocaptioner command.
+
+All core modules call through this single module to invoke the existing
+videocaptioner CLI. This keeps the Click harness thin and delegates real
+work to the production-tested videocaptioner package.
+"""
+
+import json
+import subprocess
+import shutil
+from pathlib import Path
+from typing import Any
+
+
+def _find_vc() -> str:
+    """Locate the videocaptioner binary."""
+    path = shutil.which("videocaptioner")
+    if not path:
+        raise RuntimeError(
+            "videocaptioner not found on PATH. "
+            "Install with: pip install videocaptioner"
+        )
+    return path
+
+
+def run(args: list[str], timeout: int = 600) -> dict[str, Any]:
+    """Run a videocaptioner CLI command and return structured result.
+
+    Args:
+        args: Command arguments (without 'videocaptioner' prefix).
+        timeout: Max seconds to wait.
+
+    Returns:
+        Dict with 'exit_code', 'stdout', 'stderr', 'output_path' (if found).
+    """
+    cmd = [_find_vc()] + args
+    try:
+        result = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            timeout=timeout,
+        )
+    except subprocess.TimeoutExpired:
+        raise RuntimeError(f"Command timed out after {timeout}s: {' '.join(cmd)}")
+
+    # Extract output path from quiet mode stdout
+    stdout = result.stdout.strip()
+    output_path = stdout if stdout and Path(stdout).suffix else None
+
+    return {
+        "exit_code": result.returncode,
+        "stdout": stdout,
+        "stderr": result.stderr.strip(),
+        "output_path": output_path,
+        "command": " ".join(cmd),
+    }
+
+
+def run_quiet(args: list[str], timeout: int = 600) -> str:
+    """Run in quiet mode and return the output file path.
+
+    Raises RuntimeError on failure.
+    """
+    result = run(args + ["-q"], timeout=timeout)
+    if result["exit_code"] != 0:
+        error_msg = result["stderr"] or result["stdout"] or "Unknown error"
+        raise RuntimeError(f"videocaptioner failed (exit {result['exit_code']}): {error_msg}")
+    return result["stdout"]
+
+
+def get_version() -> str:
+    """Get videocaptioner version string."""
+    result = run(["--version"])
+    return result["stdout"]
+
+
+def get_config() -> str:
+    """Get current configuration."""
+    result = run(["config", "show"])
+    return result["stdout"]
+
+
+def get_styles() -> str:
+    """Get available subtitle styles."""
+    result = run(["style"])
+    return result["stdout"]
--- a/videocaptioner/agent-harness/cli_anything/videocaptioner/videocaptioner_cli.py
+++ b/videocaptioner/agent-harness/cli_anything/videocaptioner/videocaptioner_cli.py
@@ -0,0 +1,362 @@
+#!/usr/bin/env python3
+"""VideoCaptioner CLI — AI-powered video captioning from the command line.
+
+Transcribe speech, optimize and translate subtitles, then burn them into
+video with beautiful customizable styles (ASS outline or rounded background).
+
+Usage:
+    cli-anything-videocaptioner transcribe video.mp4 --asr bijian
+    cli-anything-videocaptioner subtitle input.srt --translator bing --target-language en
+    cli-anything-videocaptioner synthesize video.mp4 -s sub.srt --subtitle-mode hard --style anime
+    cli-anything-videocaptioner process video.mp4 --asr bijian --translator bing --target-language ja
+    cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
+"""
+
+import sys
+import json
+import shlex
+import click
+from typing import Optional
+
+from cli_anything.videocaptioner.utils import vc_backend
+from cli_anything.videocaptioner.core import transcribe as transcribe_mod
+from cli_anything.videocaptioner.core import subtitle as subtitle_mod
+from cli_anything.videocaptioner.core import synthesize as synthesize_mod
+from cli_anything.videocaptioner.core import pipeline as pipeline_mod
+
+_json_output = False
+_repl_mode = False
+
+
+def output(data, message: str = ""):
+    if _json_output:
+        click.echo(json.dumps(data, indent=2, default=str))
+    else:
+        if message:
+            click.echo(message)
+        if isinstance(data, dict):
+            for k, v in data.items():
+                click.echo(f"  {k}: {v}")
+        elif isinstance(data, str):
+            click.echo(data)
+
+
+def handle_error(func):
+    def wrapper(*args, **kwargs):
+        try:
+            return func(*args, **kwargs)
+        except RuntimeError as e:
+            if _json_output:
+                click.echo(json.dumps({"error": str(e), "type": "runtime_error"}))
+            else:
+                click.echo(f"Error: {e}", err=True)
+            if not _repl_mode:
+                sys.exit(1)
+    wrapper.__name__ = func.__name__
+    wrapper.__doc__ = func.__doc__
+    return wrapper
+
+
+# ── Main CLI Group ──────────────────────────────────────────────
+@click.group(invoke_without_command=True)
+@click.option("--json", "use_json", is_flag=True, help="Output as JSON")
+@click.pass_context
+def cli(ctx, use_json):
+    """VideoCaptioner CLI — AI-powered video captioning.
+
+    Transcribe speech, optimize/translate subtitles, burn into video with
+    beautiful styles. Free ASR (bijian) and translation (Bing/Google) included.
+
+    Run without a subcommand to enter interactive REPL mode.
+    """
+    global _json_output
+    _json_output = use_json
+    if ctx.invoked_subcommand is None:
+        ctx.invoke(repl)
+
+
+# ── Transcribe ──────────────────────────────────────────────────
+@cli.command()
+@click.argument("input_path")
+@click.option("--asr", type=click.Choice(["bijian", "jianying", "whisper-api", "whisper-cpp"]),
+              default="bijian", help="ASR engine (bijian/jianying: free, Chinese & English only)")
+@click.option("--language", default="auto", help="Source language ISO 639-1 code, or 'auto'")
+@click.option("--format", "fmt", type=click.Choice(["srt", "ass", "txt", "json"]),
+              default="srt", help="Output format")
+@click.option("-o", "--output", "output_path", default=None, help="Output file or directory path")
+@click.option("--word-timestamps", is_flag=True, help="Include word-level timestamps")
+@click.option("--whisper-api-key", default=None, help="Whisper API key")
+@click.option("--whisper-api-base", default=None, help="Whisper API base URL")
+@click.option("--whisper-model", default=None, help="Whisper model name")
+@handle_error
+def transcribe(input_path, asr, language, fmt, output_path, word_timestamps,
+               whisper_api_key, whisper_api_base, whisper_model):
+    """Transcribe audio/video to subtitles."""
+    result_path = transcribe_mod.transcribe(
+        input_path, output_path=output_path, asr=asr, language=language,
+        format=fmt, word_timestamps=word_timestamps,
+        whisper_api_key=whisper_api_key, whisper_api_base=whisper_api_base,
+        whisper_model=whisper_model,
+    )
+    output({"output_path": result_path}, f"✓ Transcription complete → {result_path}")
+
+
+# ── Subtitle ────────────────────────────────────────────────────
+@cli.command()
+@click.argument("input_path")
+@click.option("--translator", type=click.Choice(["llm", "bing", "google"]),
+              default=None, help="Translation service (bing/google: free)")
+@click.option("--target-language", default=None, help="Target language BCP 47 code (e.g. en, ja, ko)")
+@click.option("--format", "fmt", type=click.Choice(["srt", "ass", "txt", "json"]),
+              default="srt", help="Output format")
+@click.option("-o", "--output", "output_path", default=None, help="Output file or directory path")
+@click.option("--layout", type=click.Choice(["target-above", "source-above", "target-only", "source-only"]),
+              default=None, help="Bilingual subtitle layout")
+@click.option("--no-optimize", is_flag=True, help="Skip LLM optimization")
+@click.option("--no-translate", is_flag=True, help="Skip translation")
+@click.option("--no-split", is_flag=True, help="Skip re-segmentation")
+@click.option("--reflect", is_flag=True, help="Reflective translation (LLM only, higher quality)")
+@click.option("--prompt", default=None, help="Custom LLM prompt")
+@click.option("--api-key", default=None, help="LLM API key")
+@click.option("--api-base", default=None, help="LLM API base URL")
+@click.option("--model", default=None, help="LLM model name")
+@handle_error
+def subtitle(input_path, translator, target_language, fmt, output_path, layout,
+             no_optimize, no_translate, no_split, reflect, prompt, api_key, api_base, model):
+    """Optimize and/or translate subtitle files.
+
+    Three processing steps (all enabled by default except translation):
+      1. Split — re-segment by semantic boundaries (LLM)
+      2. Optimize — fix ASR errors, punctuation (LLM)
+      3. Translate — to another language (LLM/Bing/Google)
+
+    Use --translator or --target-language to enable translation.
+    """
+    result_path = subtitle_mod.process_subtitle(
+        input_path, output_path=output_path, translator=translator,
+        target_language=target_language, format=fmt, layout=layout,
+        no_optimize=no_optimize, no_translate=no_translate, no_split=no_split,
+        reflect=reflect, prompt=prompt, api_key=api_key, api_base=api_base, model=model,
+    )
+    output({"output_path": result_path}, f"✓ Subtitle processing complete → {result_path}")
+
+
+# ── Synthesize ──────────────────────────────────────────────────
+@cli.command()
+@click.argument("video_path")
+@click.option("-s", "--subtitle", "subtitle_path", required=True, help="Subtitle file path")
+@click.option("--subtitle-mode", type=click.Choice(["soft", "hard"]),
+              default="soft", help="soft: embedded track, hard: burned into frames")
+@click.option("--quality", type=click.Choice(["ultra", "high", "medium", "low"]),
+              default="medium", help="Video quality (ultra=CRF18, high=CRF23, medium=CRF28, low=CRF32)")
+@click.option("-o", "--output", "output_path", default=None, help="Output video file path")
+@click.option("--layout", type=click.Choice(["target-above", "source-above", "target-only", "source-only"]),
+              default=None, help="Bilingual subtitle layout")
+@click.option("--render-mode", type=click.Choice(["ass", "rounded"]),
+              default=None, help="ass: outline/shadow, rounded: background boxes")
+@click.option("--style", default=None, help="Style preset (default, anime, vertical, rounded)")
+@click.option("--style-override", default=None, help='Inline JSON, e.g. \'{"outline_color": "#ff0000"}\'')
+@click.option("--font-file", default=None, help="Custom font file (.ttf/.otf)")
+@handle_error
+def synthesize(video_path, subtitle_path, subtitle_mode, quality, output_path,
+               layout, render_mode, style, style_override, font_file):
+    """Burn subtitles into video with customizable styles.
+
+    Two rendering modes for beautiful subtitles:
+      ASS — traditional outline/shadow (presets: default, anime, vertical)
+      Rounded — modern rounded background boxes
+
+    Use 'cli-anything-videocaptioner styles' to see all presets.
+    """
+    result_path = synthesize_mod.synthesize(
+        video_path, subtitle_path, output_path=output_path,
+        subtitle_mode=subtitle_mode, quality=quality, layout=layout,
+        render_mode=render_mode, style=style, style_override=style_override,
+        font_file=font_file,
+    )
+    output({"output_path": result_path}, f"✓ Video synthesis complete → {result_path}")
+
+
+# ── Process (full pipeline) ─────────────────────────────────────
+@cli.command()
+@click.argument("input_path")
+@click.option("--asr", type=click.Choice(["bijian", "jianying", "whisper-api", "whisper-cpp"]),
+              default="bijian", help="ASR engine")
+@click.option("--language", default="auto", help="Source language")
+@click.option("--translator", type=click.Choice(["llm", "bing", "google"]),
+              default=None, help="Translation service (bing/google: free)")
+@click.option("--target-language", default=None, help="Target language BCP 47 code")
+@click.option("--subtitle-mode", type=click.Choice(["soft", "hard"]), default="soft")
+@click.option("--quality", type=click.Choice(["ultra", "high", "medium", "low"]), default="medium")
+@click.option("-o", "--output", "output_path", default=None, help="Output file or directory path")
+@click.option("--layout", type=click.Choice(["target-above", "source-above", "target-only", "source-only"]), default=None)
+@click.option("--style", default=None, help="Style preset name")
+@click.option("--style-override", default=None, help="Inline JSON style override")
+@click.option("--render-mode", type=click.Choice(["ass", "rounded"]), default=None)
+@click.option("--no-optimize", is_flag=True, help="Skip optimization")
+@click.option("--no-translate", is_flag=True, help="Skip translation")
+@click.option("--no-split", is_flag=True, help="Skip re-segmentation")
+@click.option("--no-synthesize", is_flag=True, help="Skip video synthesis")
+@click.option("--reflect", is_flag=True, help="Reflective translation (LLM only)")
+@click.option("--prompt", default=None, help="Custom LLM prompt")
+@click.option("--api-key", default=None, help="LLM API key")
+@click.option("--api-base", default=None, help="LLM API base URL")
+@click.option("--model", default=None, help="LLM model name")
+@handle_error
+def process(input_path, asr, language, translator, target_language, subtitle_mode,
+            quality, output_path, layout, style, style_override, render_mode,
+            no_optimize, no_translate, no_split, no_synthesize, reflect,
+            prompt, api_key, api_base, model):
+    """Full pipeline: transcribe → optimize → translate → synthesize.
+
+    One command to go from video to captioned video with translated subtitles.
+    Audio files automatically skip video synthesis.
+    """
+    result_path = pipeline_mod.process(
+        input_path, output_path=output_path, asr=asr, language=language,
+        translator=translator, target_language=target_language,
+        subtitle_mode=subtitle_mode, quality=quality, layout=layout,
+        style=style, style_override=style_override, render_mode=render_mode,
+        no_optimize=no_optimize, no_translate=no_translate, no_split=no_split,
+        no_synthesize=no_synthesize, reflect=reflect, prompt=prompt,
+        api_key=api_key, api_base=api_base, model=model,
+    )
+    output({"output_path": result_path}, f"✓ Pipeline complete → {result_path}")
+
+
+# ── Styles ──────────────────────────────────────────────────────
+@cli.command()
+@handle_error
+def styles():
+    """List available subtitle style presets."""
+    result = vc_backend.get_styles()
+    if _json_output:
+        click.echo(json.dumps({"styles": result}))
+    else:
+        click.echo(result)
+
+
+# ── Config ──────────────────────────────────────────────────────
+@cli.group()
+def config():
+    """View and manage configuration."""
+    pass
+
+
+@config.command("show")
+@handle_error
+def config_show():
+    """Display current configuration."""
+    result = vc_backend.get_config()
+    if _json_output:
+        click.echo(json.dumps({"config": result}))
+    else:
+        click.echo(result)
+
+
+@config.command("set")
+@click.argument("key")
+@click.argument("value")
+@handle_error
+def config_set(key, value):
+    """Set a configuration value."""
+    result = vc_backend.run(["config", "set", key, value])
+    if result["exit_code"] != 0:
+        raise RuntimeError(result["stderr"] or result["stdout"])
+    output({"key": key, "value": value}, f"✓ {key} = {value}")
+
+
+# ── Download ────────────────────────────────────────────────────
+@cli.command()
+@click.argument("url")
+@click.option("-o", "--output", "output_dir", default=".", help="Output directory")
+@handle_error
+def download(url, output_dir):
+    """Download online video (YouTube, Bilibili, etc.)."""
+    result_path = vc_backend.run_quiet(["download", url, "-o", output_dir])
+    output({"output_path": result_path}, f"✓ Downloaded → {result_path}")
+
+
+# ── Session ─────────────────────────────────────────────────────
+@cli.group()
+def session():
+    """Session state commands."""
+    pass
+
+
+@session.command("status")
+@handle_error
+def session_status():
+    """Show VideoCaptioner version and configuration."""
+    version = vc_backend.get_version()
+    data = {"version": version, "json_output": _json_output}
+    output(data, f"VideoCaptioner {version}")
+
+
+# ── REPL ────────────────────────────────────────────────────────
+@cli.command()
+@handle_error
+def repl():
+    """Start interactive REPL session."""
+    from cli_anything.videocaptioner.utils.repl_skin import ReplSkin
+
+    global _repl_mode
+    _repl_mode = True
+
+    skin = ReplSkin("videocaptioner", version="1.0.0")
+    skin.print_banner()
+
+    pt_session = skin.create_prompt_session()
+
+    _repl_commands = {
+        "transcribe": "Transcribe audio/video to subtitles",
+        "subtitle":   "Optimize and/or translate subtitles",
+        "synthesize": "Burn subtitles into video",
+        "process":    "Full pipeline (transcribe → translate → synthesize)",
+        "styles":     "List subtitle style presets",
+        "config":     "show|set <key> <value>",
+        "download":   "Download online video",
+        "session":    "status",
+        "help":       "Show this help",
+        "quit":       "Exit REPL",
+    }
+
+    while True:
+        try:
+            line = skin.get_input(pt_session, project_name="", modified=False)
+            if not line:
+                continue
+            if line.lower() in ("quit", "exit", "q"):
+                skin.print_goodbye()
+                break
+            if line.lower() == "help":
+                skin.help(_repl_commands)
+                continue
+
+            try:
+                args = shlex.split(line)
+            except ValueError:
+                args = line.split()
+            try:
+                cli.main(args, standalone_mode=False)
+            except SystemExit:
+                pass
+            except click.exceptions.UsageError as e:
+                skin.warning(f"Usage error: {e}")
+            except Exception as e:
+                skin.error(f"{e}")
+
+        except (EOFError, KeyboardInterrupt):
+            skin.print_goodbye()
+            break
+
+    _repl_mode = False
+
+
+# ── Entry Point ─────────────────────────────────────────────────
+def main():
+    cli()
+
+
+if __name__ == "__main__":
+    main()
--- a/videocaptioner/agent-harness/setup.py
+++ b/videocaptioner/agent-harness/setup.py
@@ -0,0 +1,51 @@
+#!/usr/bin/env python3
+"""setup.py for cli-anything-videocaptioner"""
+
+from setuptools import setup, find_namespace_packages
+
+with open("cli_anything/videocaptioner/README.md", "r", encoding="utf-8") as fh:
+    long_description = fh.read()
+
+setup(
+    name="cli-anything-videocaptioner",
+    version="1.0.0",
+    author="Weifeng",
+    author_email="",
+    description="CLI harness for VideoCaptioner — AI-powered video captioning with beautiful subtitle styles. Requires: videocaptioner (pip install videocaptioner), ffmpeg",
+    long_description=long_description,
+    long_description_content_type="text/markdown",
+    url="https://github.com/WEIFENG2333/VideoCaptioner",
+    packages=find_namespace_packages(include=["cli_anything.*"]),
+    classifiers=[
+        "Development Status :: 4 - Beta",
+        "Intended Audience :: Developers",
+        "Topic :: Multimedia :: Video",
+        "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
+        "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.10",
+        "Programming Language :: Python :: 3.11",
+        "Programming Language :: Python :: 3.12",
+    ],
+    python_requires=">=3.10",
+    install_requires=[
+        "click>=8.0.0",
+        "prompt-toolkit>=3.0.0",
+        "videocaptioner",
+    ],
+    extras_require={
+        "dev": [
+            "pytest>=7.0.0",
+            "pytest-cov>=4.0.0",
+        ],
+    },
+    entry_points={
+        "console_scripts": [
+            "cli-anything-videocaptioner=cli_anything.videocaptioner.videocaptioner_cli:main",
+        ],
+    },
+    package_data={
+        "cli_anything.videocaptioner": ["skills/*.md"],
+    },
+    include_package_data=True,
+    zip_safe=False,
+)