feat: Add VideoCaptioner CLI harness — AI-powered video captioning

Add agent-harness for VideoCaptioner, an AI-powered video captioning tool.

Pipeline: Speech transcription → Subtitle optimization → Translation → Video synthesis with styled subtitles.

Key features:
- 4 ASR engines (bijian/jianying free, whisper-api, whisper-cpp)
- 3 translation services (LLM, Bing free, Google free), 38 languages
- Beautiful subtitle styles (ASS outline + rounded background)
- Full pipeline in one command
- 26 tests (14 unit + 12 e2e), all passing
This commit is contained in:
liangweifeng
2026-03-29 21:09:01 +08:00
parent 790a186a4c
commit 333ae5986f
18 changed files with 1743 additions and 0 deletions

View File

@@ -0,0 +1,83 @@
# VideoCaptioner: Project-Specific Analysis & SOP
## Architecture Summary
VideoCaptioner is an AI-powered video captioning tool that provides a complete
pipeline from speech recognition to styled subtitle synthesis. It ships as a
standalone CLI (`pip install videocaptioner`) with a well-defined command interface.
```
+----------------------------------------------------------+
| VideoCaptioner CLI |
| +------------+ +----------+ +-----------+ +-----------+ |
| | Transcribe | | Subtitle | | Synthesize| | Process | |
| | (ASR) | | (NLP) | | (FFmpeg) | | (Pipeline)| |
| +-----+------+ +----+-----+ +-----+-----+ +-----+-----+ |
| | | | | |
| +-----+--------------+-------------+-------------+-----+ |
| | Core Engine | |
| | ASR engines, LLM optimization, Translation, | |
| | Subtitle rendering (ASS + Rounded), FFmpeg | |
| +-----------------------------------------------------+ |
+----------------------------------------------------------+
```
## CLI Strategy: Subprocess Wrapper
Unlike applications that need reverse-engineering of internal formats,
VideoCaptioner already provides a production CLI. Our harness:
1. **Click wrapper** provides the CLI-Anything standard interface
2. **Subprocess backend** delegates to `videocaptioner` CLI commands
3. **JSON mode** (`--json`) returns structured output for agents
4. **REPL mode** provides interactive session with tab-completion
### Why Subprocess?
VideoCaptioner's CLI is:
- **Production-tested** with 50+ unit tests and 200+ QA test cases
- **Feature-complete** with 7 subcommands covering the full pipeline
- **Well-documented** with clear `--help` text and exit codes
- **Actively maintained** on PyPI with automated releases
Wrapping via subprocess preserves all these qualities without reimplementation.
## Coverage
### Transcription (4 ASR engines)
- `bijian` — Free, Chinese & English, no setup needed
- `jianying` — Free, Chinese & English, no setup needed
- `whisper-api` — All languages, OpenAI-compatible API
- `whisper-cpp` — All languages, local model
### Subtitle Processing
- **Split** — Semantic re-segmentation via LLM
- **Optimize** — Fix ASR errors, punctuation, formatting via LLM
- **Translate** — 38 languages, 3 translators (LLM, Bing free, Google free)
- **Layout** — target-above, source-above, target-only, source-only
### Video Synthesis
- **Soft subtitles** — Embedded subtitle track (switchable)
- **Hard subtitles** — Burned into video frames
- **ASS style** — Traditional outline/shadow with presets (default, anime, vertical)
- **Rounded style** — Modern rounded background boxes
- **Customizable** — Inline JSON override for any style parameter
- **Quality levels** — ultra (CRF 18), high (CRF 23), medium (CRF 28), low (CRF 32)
### Utilities
- Configuration management (TOML config + env vars)
- Style preset listing with full parameters
- Online video download (YouTube, Bilibili, etc.)
## Testing Strategy
- **Unit tests**: Mock subprocess calls, verify argument construction
- **End-to-end tests**: Real videocaptioner CLI with test media files
- **Prerequisite**: `videocaptioner` and `ffmpeg` must be installed
## Limitations
- Requires `videocaptioner` package to be installed separately
- Free ASR engines (bijian/jianying) only support Chinese & English
- LLM features require an OpenAI-compatible API key
- Hard subtitle styles require FFmpeg

View File

@@ -0,0 +1,71 @@
# VideoCaptioner CLI
AI-powered video captioning tool with beautiful customizable subtitle styles.
## Architecture
- **Subprocess backend** delegates to the production `videocaptioner` CLI (`pip install videocaptioner`)
- **Click** provides the CLI framework with subcommand groups and REPL
- **JSON output mode** (`--json`) for agent consumption
- **Free features included**: bijian ASR (Chinese/English), Bing/Google translation
## Pipeline
```
Audio/Video → ASR Transcription → Subtitle Splitting → LLM Optimization → Translation → Video Synthesis
(bijian/whisper) (semantic) (fix errors) (38 languages) (styled subtitles)
```
## Install
```bash
pip install videocaptioner click prompt-toolkit
```
## Run
```bash
# One-shot: transcribe a Chinese video and add English subtitles
cli-anything-videocaptioner process video.mp4 --asr bijian --translator bing --target-language en --subtitle-mode hard
# Transcribe only
cli-anything-videocaptioner transcribe video.mp4 --asr bijian -o output.srt
# Translate existing subtitles
cli-anything-videocaptioner subtitle input.srt --translator google --target-language ja
# Burn subtitles with anime style
cli-anything-videocaptioner synthesize video.mp4 -s sub.srt --subtitle-mode hard --style anime
# Custom style (red outline, large font)
cli-anything-videocaptioner synthesize video.mp4 -s sub.srt --subtitle-mode hard \
--style-override '{"outline_color": "#ff0000", "font_size": 48}'
# JSON output mode (for agent consumption)
cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
# Interactive REPL
cli-anything-videocaptioner
```
## Subtitle Styles
Two rendering modes for beautiful subtitles:
**ASS mode** — traditional outline/shadow:
- Presets: `default` (white+black), `anime` (warm+orange), `vertical` (portrait videos)
**Rounded mode** — modern rounded background boxes:
- Preset: `rounded` (dark text on semi-transparent background)
Fully customizable via `--style-override` with inline JSON.
## Coverage
| Feature | Commands |
|---------|----------|
| Transcription | 4 ASR engines, auto language detection, word timestamps |
| Subtitle Processing | Split + optimize + translate, 3 translators, 38 languages |
| Video Synthesis | Soft/hard subtitles, 4 quality levels, 5 style presets |
| Styles | ASS outline + rounded background, inline JSON customization |
| Utilities | Config management, style listing, video download |

View File

@@ -0,0 +1,91 @@
"""Full pipeline — transcribe → optimize → translate → synthesize in one command."""
from cli_anything.videocaptioner.utils.vc_backend import run_quiet
def process(
input_path: str,
output_path: str | None = None,
asr: str = "bijian",
language: str = "auto",
translator: str | None = None,
target_language: str | None = None,
subtitle_mode: str = "soft",
quality: str = "medium",
layout: str | None = None,
style: str | None = None,
style_override: str | None = None,
render_mode: str | None = None,
no_optimize: bool = False,
no_translate: bool = False,
no_split: bool = False,
no_synthesize: bool = False,
reflect: bool = False,
prompt: str | None = None,
api_key: str | None = None,
api_base: str | None = None,
model: str | None = None,
) -> str:
"""Run the complete captioning pipeline.
Args:
input_path: Video or audio file path.
output_path: Output file or directory path.
asr: ASR engine.
language: Source language.
translator: Translation service.
target_language: Target language.
subtitle_mode: soft or hard.
quality: Video quality.
layout: Bilingual layout.
style: Style preset name.
style_override: Inline JSON style override.
render_mode: ass or rounded.
no_optimize: Skip optimization.
no_translate: Skip translation.
no_split: Skip re-segmentation.
no_synthesize: Skip video synthesis.
reflect: Reflective translation.
prompt: Custom LLM prompt.
api_key: LLM API key.
api_base: LLM API base URL.
model: LLM model name.
Returns:
Output file path.
"""
args = ["process", input_path, "--asr", asr, "--language", language,
"--subtitle-mode", subtitle_mode, "--quality", quality]
if output_path:
args += ["-o", output_path]
if translator:
args += ["--translator", translator]
if target_language:
args += ["--target-language", target_language]
if layout:
args += ["--layout", layout]
if style:
args += ["--style", style]
if style_override:
args += ["--style-override", style_override]
if render_mode:
args += ["--render-mode", render_mode]
if no_optimize:
args.append("--no-optimize")
if no_translate:
args.append("--no-translate")
if no_split:
args.append("--no-split")
if no_synthesize:
args.append("--no-synthesize")
if reflect:
args.append("--reflect")
if prompt:
args += ["--prompt", prompt]
if api_key:
args += ["--api-key", api_key]
if api_base:
args += ["--api-base", api_base]
if model:
args += ["--model", model]
return run_quiet(args)

View File

@@ -0,0 +1,68 @@
"""Subtitle processing — optimize and translate subtitle files."""
from cli_anything.videocaptioner.utils.vc_backend import run_quiet
def process_subtitle(
input_path: str,
output_path: str | None = None,
translator: str | None = None,
target_language: str | None = None,
format: str = "srt",
layout: str | None = None,
no_optimize: bool = False,
no_translate: bool = False,
no_split: bool = False,
reflect: bool = False,
prompt: str | None = None,
api_key: str | None = None,
api_base: str | None = None,
model: str | None = None,
) -> str:
"""Optimize and/or translate a subtitle file.
Args:
input_path: Subtitle file (.srt, .ass, .vtt).
output_path: Output file or directory path.
translator: Translation service (llm, bing, google).
target_language: Target language BCP 47 code.
format: Output format (srt, ass, txt, json).
layout: Bilingual layout (target-above, source-above, target-only, source-only).
no_optimize: Skip LLM optimization.
no_translate: Skip translation.
no_split: Skip re-segmentation.
reflect: Enable reflective translation (LLM only).
prompt: Custom LLM prompt.
api_key: LLM API key.
api_base: LLM API base URL.
model: LLM model name.
Returns:
Output file path.
"""
args = ["subtitle", input_path, "--format", format]
if output_path:
args += ["-o", output_path]
if translator:
args += ["--translator", translator]
if target_language:
args += ["--target-language", target_language]
if layout:
args += ["--layout", layout]
if no_optimize:
args.append("--no-optimize")
if no_translate:
args.append("--no-translate")
if no_split:
args.append("--no-split")
if reflect:
args.append("--reflect")
if prompt:
args += ["--prompt", prompt]
if api_key:
args += ["--api-key", api_key]
if api_base:
args += ["--api-base", api_base]
if model:
args += ["--model", model]
return run_quiet(args)

View File

@@ -0,0 +1,49 @@
"""Video synthesis — burn subtitles into video with customizable styles."""
from cli_anything.videocaptioner.utils.vc_backend import run_quiet
def synthesize(
video_path: str,
subtitle_path: str,
output_path: str | None = None,
subtitle_mode: str = "soft",
quality: str = "medium",
layout: str | None = None,
render_mode: str | None = None,
style: str | None = None,
style_override: str | None = None,
font_file: str | None = None,
) -> str:
"""Burn subtitles into a video file.
Args:
video_path: Input video file.
subtitle_path: Subtitle file (.srt, .ass).
output_path: Output video file path.
subtitle_mode: 'soft' (embedded track) or 'hard' (burned in).
quality: Video quality (ultra, high, medium, low).
layout: Bilingual layout.
render_mode: 'ass' (outline/shadow) or 'rounded' (background boxes).
style: Style preset name (default, anime, vertical, rounded).
style_override: Inline JSON to override style fields.
font_file: Custom font file path (.ttf/.otf).
Returns:
Output video file path.
"""
args = ["synthesize", video_path, "-s", subtitle_path,
"--subtitle-mode", subtitle_mode, "--quality", quality]
if output_path:
args += ["-o", output_path]
if layout:
args += ["--layout", layout]
if render_mode:
args += ["--render-mode", render_mode]
if style:
args += ["--style", style]
if style_override:
args += ["--style-override", style_override]
if font_file:
args += ["--font-file", font_file]
return run_quiet(args)

View File

@@ -0,0 +1,44 @@
"""Transcription — speech to subtitles via ASR engines."""
from cli_anything.videocaptioner.utils.vc_backend import run_quiet
def transcribe(
input_path: str,
output_path: str | None = None,
asr: str = "bijian",
language: str = "auto",
format: str = "srt",
word_timestamps: bool = False,
whisper_api_key: str | None = None,
whisper_api_base: str | None = None,
whisper_model: str | None = None,
) -> str:
"""Transcribe audio/video to subtitles.
Args:
input_path: Audio or video file path.
output_path: Output file or directory path.
asr: ASR engine (bijian, jianying, whisper-api, whisper-cpp).
language: Source language ISO 639-1 code, or 'auto'.
format: Output format (srt, ass, txt, json).
word_timestamps: Include word-level timestamps.
whisper_api_key: Whisper API key (for whisper-api engine).
whisper_api_base: Whisper API base URL.
whisper_model: Whisper model name.
Returns:
Output file path.
"""
args = ["transcribe", input_path, "--asr", asr, "--language", language, "--format", format]
if output_path:
args += ["-o", output_path]
if word_timestamps:
args.append("--word-timestamps")
if whisper_api_key:
args += ["--whisper-api-key", whisper_api_key]
if whisper_api_base:
args += ["--whisper-api-base", whisper_api_base]
if whisper_model:
args += ["--whisper-model", whisper_model]
return run_quiet(args)

View File

@@ -0,0 +1,123 @@
---
name: >-
cli-anything-videocaptioner
description: >-
AI-powered video captioning — transcribe speech, optimize/translate subtitles, burn into video with beautiful customizable styles (ASS outline or rounded background). Free ASR and translation included.
---
# cli-anything-videocaptioner
AI-powered video captioning tool. Transcribe speech → optimize subtitles → translate → burn into video with beautiful styles.
## Installation
```bash
pip install cli-anything-videocaptioner
```
**Prerequisites:**
- Python 3.10+
- `videocaptioner` must be installed (`pip install videocaptioner`)
- FFmpeg required for video synthesis
## Usage
### Basic Commands
```bash
# Show help
cli-anything-videocaptioner --help
# Start interactive REPL mode
cli-anything-videocaptioner
# Transcribe a video (free, no setup)
cli-anything-videocaptioner transcribe video.mp4 --asr bijian
# Translate subtitles (free Bing translator)
cli-anything-videocaptioner subtitle input.srt --translator bing --target-language en
# Full pipeline: transcribe → translate → burn subtitles
cli-anything-videocaptioner process video.mp4 --asr bijian --translator bing --target-language en --subtitle-mode hard
# JSON output (for agent consumption)
cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
```
### REPL Mode
When invoked without a subcommand, the CLI enters an interactive REPL session:
```bash
cli-anything-videocaptioner
# Enter commands interactively with tab-completion and history
```
## Command Groups
### transcribe — Speech to subtitles
```
transcribe <input> [--asr bijian|jianying|whisper-api|whisper-cpp] [--language CODE] [--format srt|ass|txt|json] [-o PATH]
```
- `bijian` (default): Free, Chinese & English, no setup
- `whisper-api`: All languages, requires `--whisper-api-key`
### subtitle — Optimize and translate
```
subtitle <input.srt> [--translator llm|bing|google] [--target-language CODE] [--layout target-above|source-above|target-only|source-only] [--no-optimize] [--no-translate] [-o PATH]
```
- Three steps: Split → Optimize → Translate
- Bing/Google translators are free
- 38 target languages supported (BCP 47 codes)
### synthesize — Burn subtitles into video
```
synthesize <video> -s <subtitle> [--subtitle-mode soft|hard] [--quality ultra|high|medium|low] [--style NAME] [--style-override JSON] [--render-mode ass|rounded] [--font-file PATH] [-o PATH]
```
- **ASS mode**: Outline/shadow style with presets (default, anime, vertical)
- **Rounded mode**: Modern rounded background boxes
- Customizable via `--style-override '{"outline_color": "#ff0000"}'`
### process — Full pipeline
```
process <input> [--asr ...] [--translator ...] [--target-language ...] [--subtitle-mode ...] [--style ...] [--no-optimize] [--no-translate] [--no-synthesize] [-o PATH]
```
### styles — List style presets
```
styles
```
### config — Manage settings
```
config show
config set <key> <value>
```
### download — Download online video
```
download <URL> [-o DIR]
```
## JSON Output
All commands support `--json` for machine-readable output:
```bash
cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
# {"output_path": "/path/to/output.srt"}
```
## Style Presets
| Name | Mode | Description |
|------|------|-------------|
| `default` | ASS | White text, black outline — clean and universal |
| `anime` | ASS | Warm white, orange outline — anime/cartoon style |
| `vertical` | ASS | High bottom margin — for portrait/vertical videos |
| `rounded` | Rounded | Dark text on semi-transparent rounded background |
Customize any field: `--style-override '{"font_size": 48, "outline_color": "#ff0000"}'`
## Target Languages
BCP 47 codes: `zh-Hans` `zh-Hant` `en` `ja` `ko` `fr` `de` `es` `ru` `pt` `it` `ar` `th` `vi` `id` and 23 more.

View File

@@ -0,0 +1,110 @@
"""Unit tests for VideoCaptioner CLI harness core modules."""
import pytest
from unittest.mock import patch, MagicMock
class TestTranscribe:
@patch("cli_anything.videocaptioner.core.transcribe.run_quiet", return_value="/tmp/o.srt")
def test_basic(self, mock_run):
from cli_anything.videocaptioner.core.transcribe import transcribe
assert transcribe("video.mp4") == "/tmp/o.srt"
assert "transcribe" in mock_run.call_args[0][0]
assert "bijian" in mock_run.call_args[0][0]
@patch("cli_anything.videocaptioner.core.transcribe.run_quiet", return_value="/tmp/o.json")
def test_options(self, mock_run):
from cli_anything.videocaptioner.core.transcribe import transcribe
transcribe("v.mp4", asr="whisper-api", language="fr", format="json",
output_path="/tmp/o.json", whisper_api_key="sk-xxx")
a = mock_run.call_args[0][0]
assert "whisper-api" in a and "fr" in a and "json" in a and "sk-xxx" in a
@patch("cli_anything.videocaptioner.core.transcribe.run_quiet", return_value="/tmp/o.srt")
def test_word_timestamps(self, mock_run):
from cli_anything.videocaptioner.core.transcribe import transcribe
transcribe("v.mp4", word_timestamps=True)
assert "--word-timestamps" in mock_run.call_args[0][0]
class TestSubtitle:
@patch("cli_anything.videocaptioner.core.subtitle.run_quiet", return_value="/tmp/o.srt")
def test_translate(self, mock_run):
from cli_anything.videocaptioner.core.subtitle import process_subtitle
process_subtitle("in.srt", translator="bing", target_language="en")
a = mock_run.call_args[0][0]
assert "bing" in a and "en" in a
@patch("cli_anything.videocaptioner.core.subtitle.run_quiet", return_value="/tmp/o.srt")
def test_skip(self, mock_run):
from cli_anything.videocaptioner.core.subtitle import process_subtitle
process_subtitle("in.srt", no_optimize=True, no_translate=True)
a = mock_run.call_args[0][0]
assert "--no-optimize" in a and "--no-translate" in a
@patch("cli_anything.videocaptioner.core.subtitle.run_quiet", return_value="/tmp/o.srt")
def test_llm(self, mock_run):
from cli_anything.videocaptioner.core.subtitle import process_subtitle
process_subtitle("in.srt", translator="llm", target_language="ja",
reflect=True, api_key="sk-xxx", layout="target-above")
a = mock_run.call_args[0][0]
assert "--reflect" in a and "sk-xxx" in a and "target-above" in a
class TestSynthesize:
@patch("cli_anything.videocaptioner.core.synthesize.run_quiet", return_value="/tmp/o.mp4")
def test_soft(self, mock_run):
from cli_anything.videocaptioner.core.synthesize import synthesize
synthesize("v.mp4", "s.srt")
assert "soft" in mock_run.call_args[0][0]
@patch("cli_anything.videocaptioner.core.synthesize.run_quiet", return_value="/tmp/o.mp4")
def test_hard_style(self, mock_run):
from cli_anything.videocaptioner.core.synthesize import synthesize
synthesize("v.mp4", "s.srt", subtitle_mode="hard", style="anime", quality="high")
a = mock_run.call_args[0][0]
assert "hard" in a and "anime" in a and "high" in a
@patch("cli_anything.videocaptioner.core.synthesize.run_quiet", return_value="/tmp/o.mp4")
def test_rounded(self, mock_run):
from cli_anything.videocaptioner.core.synthesize import synthesize
synthesize("v.mp4", "s.srt", subtitle_mode="hard", render_mode="rounded",
style_override='{"bg_color":"#000000cc"}')
a = mock_run.call_args[0][0]
assert "rounded" in a and "#000000cc" in str(a)
class TestPipeline:
@patch("cli_anything.videocaptioner.core.pipeline.run_quiet", return_value="/tmp/o.mp4")
def test_full(self, mock_run):
from cli_anything.videocaptioner.core.pipeline import process
process("v.mp4", translator="bing", target_language="en", style="anime")
a = mock_run.call_args[0][0]
assert "process" in a and "bing" in a and "anime" in a
@patch("cli_anything.videocaptioner.core.pipeline.run_quiet", return_value="/tmp/o.srt")
def test_no_synth(self, mock_run):
from cli_anything.videocaptioner.core.pipeline import process
process("v.mp4", no_synthesize=True)
assert "--no-synthesize" in mock_run.call_args[0][0]
class TestBackend:
@patch("subprocess.run")
def test_success(self, mock_sub):
mock_sub.return_value = MagicMock(returncode=0, stdout="/tmp/o.srt\n", stderr="")
from cli_anything.videocaptioner.utils.vc_backend import run_quiet
assert run_quiet(["transcribe", "v.mp4"]) == "/tmp/o.srt"
@patch("subprocess.run")
def test_failure(self, mock_sub):
mock_sub.return_value = MagicMock(returncode=5, stdout="", stderr="Error: fail")
from cli_anything.videocaptioner.utils.vc_backend import run_quiet
with pytest.raises(RuntimeError, match="fail"):
run_quiet(["transcribe", "x.mp4"])
@patch("shutil.which", return_value=None)
def test_not_installed(self, _):
from cli_anything.videocaptioner.utils.vc_backend import _find_vc
with pytest.raises(RuntimeError, match="not found"):
_find_vc()

View File

@@ -0,0 +1,104 @@
"""End-to-end tests for VideoCaptioner CLI harness.
These tests require videocaptioner to be installed.
Skip with: pytest -m "not e2e"
"""
import pytest
import subprocess
import shutil
# Skip all tests if videocaptioner is not installed
pytestmark = pytest.mark.skipif(
shutil.which("videocaptioner") is None,
reason="videocaptioner not installed"
)
class TestCLIEntryPoint:
def test_help(self):
result = subprocess.run(
["videocaptioner", "--help"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 0
assert "transcribe" in result.stdout
assert "subtitle" in result.stdout
assert "synthesize" in result.stdout
def test_version(self):
result = subprocess.run(
["videocaptioner", "--version"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 0
assert "videocaptioner" in result.stdout
def test_style_list(self):
result = subprocess.run(
["videocaptioner", "style"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 0
assert "default" in result.stdout
assert "anime" in result.stdout or "rounded" in result.stdout
def test_config_show(self):
result = subprocess.run(
["videocaptioner", "config", "show"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 0
def test_transcribe_missing_file(self):
result = subprocess.run(
["videocaptioner", "transcribe", "nonexistent.mp4", "--asr", "bijian"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 3 # FILE_NOT_FOUND
def test_subtitle_missing_file(self):
result = subprocess.run(
["videocaptioner", "subtitle", "nonexistent.srt"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 3
def test_synthesize_missing_args(self):
result = subprocess.run(
["videocaptioner", "synthesize", "video.mp4"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 2 # USAGE_ERROR (missing -s)
def test_invalid_asr_engine(self):
result = subprocess.run(
["videocaptioner", "transcribe", "video.mp4", "--asr", "invalid"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode == 2
def test_invalid_target_language(self):
result = subprocess.run(
["videocaptioner", "subtitle", "test.srt", "--translator", "bing",
"--target-language", "invalid-lang"],
capture_output=True, text=True, timeout=10,
)
assert result.returncode != 0
class TestBackendIntegration:
def test_get_version(self):
from cli_anything.videocaptioner.utils.vc_backend import get_version
version = get_version()
assert "videocaptioner" in version.lower()
def test_get_config(self):
from cli_anything.videocaptioner.utils.vc_backend import get_config
config = get_config()
assert "llm" in config or "transcribe" in config
def test_get_styles(self):
from cli_anything.videocaptioner.utils.vc_backend import get_styles
styles = get_styles()
assert "default" in styles

View File

@@ -0,0 +1,500 @@
"""cli-anything REPL Skin — Unified terminal interface for all CLI harnesses.
Copy this file into your CLI package at:
cli_anything/<software>/utils/repl_skin.py
Usage:
from cli_anything.<software>.utils.repl_skin import ReplSkin
skin = ReplSkin("ollama", version="1.0.0")
skin.print_banner()
prompt_text = skin.prompt(project_name="llama3.2", modified=False)
skin.success("Model pulled")
skin.error("Connection failed")
skin.warning("No models loaded")
skin.info("Generating...")
skin.status("Model", "llama3.2:latest")
skin.table(headers, rows)
skin.print_goodbye()
"""
import os
import sys
# ── ANSI color codes (no external deps for core styling) ──────────────
_RESET = "\033[0m"
_BOLD = "\033[1m"
_DIM = "\033[2m"
_ITALIC = "\033[3m"
_UNDERLINE = "\033[4m"
# Brand colors
_CYAN = "\033[38;5;80m" # cli-anything brand cyan
_CYAN_BG = "\033[48;5;80m"
_WHITE = "\033[97m"
_GRAY = "\033[38;5;245m"
_DARK_GRAY = "\033[38;5;240m"
_LIGHT_GRAY = "\033[38;5;250m"
# Software accent colors — each software gets a unique accent
_ACCENT_COLORS = {
"gimp": "\033[38;5;214m", # warm orange
"blender": "\033[38;5;208m", # deep orange
"inkscape": "\033[38;5;39m", # bright blue
"audacity": "\033[38;5;33m", # navy blue
"libreoffice": "\033[38;5;40m", # green
"obs_studio": "\033[38;5;55m", # purple
"kdenlive": "\033[38;5;69m", # slate blue
"shotcut": "\033[38;5;35m", # teal green
"ollama": "\033[38;5;255m", # white (Ollama branding)
}
_DEFAULT_ACCENT = "\033[38;5;75m" # default sky blue
# Status colors
_GREEN = "\033[38;5;78m"
_YELLOW = "\033[38;5;220m"
_RED = "\033[38;5;196m"
_BLUE = "\033[38;5;75m"
_MAGENTA = "\033[38;5;176m"
# ── Brand icon ────────────────────────────────────────────────────────
# The cli-anything icon: a small colored diamond/chevron mark
_ICON = f"{_CYAN}{_BOLD}{_RESET}"
_ICON_SMALL = f"{_CYAN}{_RESET}"
# ── Box drawing characters ────────────────────────────────────────────
_H_LINE = ""
_V_LINE = ""
_TL = ""
_TR = ""
_BL = ""
_BR = ""
_T_DOWN = ""
_T_UP = ""
_T_RIGHT = ""
_T_LEFT = ""
_CROSS = ""
def _strip_ansi(text: str) -> str:
"""Remove ANSI escape codes for length calculation."""
import re
return re.sub(r"\033\[[^m]*m", "", text)
def _visible_len(text: str) -> int:
"""Get visible length of text (excluding ANSI codes)."""
return len(_strip_ansi(text))
class ReplSkin:
"""Unified REPL skin for cli-anything CLIs.
Provides consistent branding, prompts, and message formatting
across all CLI harnesses built with the cli-anything methodology.
"""
def __init__(self, software: str, version: str = "1.0.0",
history_file: str | None = None):
"""Initialize the REPL skin.
Args:
software: Software name (e.g., "gimp", "shotcut", "ollama").
version: CLI version string.
history_file: Path for persistent command history.
Defaults to ~/.cli-anything-<software>/history
"""
self.software = software.lower().replace("-", "_")
self.display_name = software.replace("_", " ").title()
self.version = version
self.accent = _ACCENT_COLORS.get(self.software, _DEFAULT_ACCENT)
# History file
if history_file is None:
from pathlib import Path
hist_dir = Path.home() / f".cli-anything-{self.software}"
hist_dir.mkdir(parents=True, exist_ok=True)
self.history_file = str(hist_dir / "history")
else:
self.history_file = history_file
# Detect terminal capabilities
self._color = self._detect_color_support()
def _detect_color_support(self) -> bool:
"""Check if terminal supports color."""
if os.environ.get("NO_COLOR"):
return False
if os.environ.get("CLI_ANYTHING_NO_COLOR"):
return False
if not hasattr(sys.stdout, "isatty"):
return False
return sys.stdout.isatty()
def _c(self, code: str, text: str) -> str:
"""Apply color code if colors are supported."""
if not self._color:
return text
return f"{code}{text}{_RESET}"
# ── Banner ────────────────────────────────────────────────────────
def print_banner(self):
"""Print the startup banner with branding."""
inner = 54
def _box_line(content: str) -> str:
"""Wrap content in box drawing, padding to inner width."""
pad = inner - _visible_len(content)
vl = self._c(_DARK_GRAY, _V_LINE)
return f"{vl}{content}{' ' * max(0, pad)}{vl}"
top = self._c(_DARK_GRAY, f"{_TL}{_H_LINE * inner}{_TR}")
bot = self._c(_DARK_GRAY, f"{_BL}{_H_LINE * inner}{_BR}")
# Title: ◆ cli-anything · Ollama
icon = self._c(_CYAN + _BOLD, "")
brand = self._c(_CYAN + _BOLD, "cli-anything")
dot = self._c(_DARK_GRAY, "·")
name = self._c(self.accent + _BOLD, self.display_name)
title = f" {icon} {brand} {dot} {name}"
ver = f" {self._c(_DARK_GRAY, f' v{self.version}')}"
tip = f" {self._c(_DARK_GRAY, ' Type help for commands, quit to exit')}"
empty = ""
print(top)
print(_box_line(title))
print(_box_line(ver))
print(_box_line(empty))
print(_box_line(tip))
print(bot)
print()
# ── Prompt ────────────────────────────────────────────────────────
def prompt(self, project_name: str = "", modified: bool = False,
context: str = "") -> str:
"""Build a styled prompt string for prompt_toolkit or input().
Args:
project_name: Current project name (empty if none open).
modified: Whether the project has unsaved changes.
context: Optional extra context to show in prompt.
Returns:
Formatted prompt string.
"""
parts = []
# Icon
if self._color:
parts.append(f"{_CYAN}{_RESET} ")
else:
parts.append("> ")
# Software name
parts.append(self._c(self.accent + _BOLD, self.software))
# Project context
if project_name or context:
ctx = context or project_name
mod = "*" if modified else ""
parts.append(f" {self._c(_DARK_GRAY, '[')}")
parts.append(self._c(_LIGHT_GRAY, f"{ctx}{mod}"))
parts.append(self._c(_DARK_GRAY, ']'))
parts.append(self._c(_GRAY, " "))
return "".join(parts)
def prompt_tokens(self, project_name: str = "", modified: bool = False,
context: str = ""):
"""Build prompt_toolkit formatted text tokens for the prompt.
Use with prompt_toolkit's FormattedText for proper ANSI handling.
Returns:
list of (style, text) tuples for prompt_toolkit.
"""
accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
tokens = []
tokens.append(("class:icon", ""))
tokens.append(("class:software", self.software))
if project_name or context:
ctx = context or project_name
mod = "*" if modified else ""
tokens.append(("class:bracket", " ["))
tokens.append(("class:context", f"{ctx}{mod}"))
tokens.append(("class:bracket", "]"))
tokens.append(("class:arrow", " "))
return tokens
def get_prompt_style(self):
"""Get a prompt_toolkit Style object matching the skin.
Returns:
prompt_toolkit.styles.Style
"""
try:
from prompt_toolkit.styles import Style
except ImportError:
return None
accent_hex = _ANSI_256_TO_HEX.get(self.accent, "#5fafff")
return Style.from_dict({
"icon": "#5fdfdf bold", # cyan brand color
"software": f"{accent_hex} bold",
"bracket": "#585858",
"context": "#bcbcbc",
"arrow": "#808080",
# Completion menu
"completion-menu.completion": "bg:#303030 #bcbcbc",
"completion-menu.completion.current": f"bg:{accent_hex} #000000",
"completion-menu.meta.completion": "bg:#303030 #808080",
"completion-menu.meta.completion.current": f"bg:{accent_hex} #000000",
# Auto-suggest
"auto-suggest": "#585858",
# Bottom toolbar
"bottom-toolbar": "bg:#1c1c1c #808080",
"bottom-toolbar.text": "#808080",
})
# ── Messages ──────────────────────────────────────────────────────
def success(self, message: str):
"""Print a success message with green checkmark."""
icon = self._c(_GREEN + _BOLD, "")
print(f" {icon} {self._c(_GREEN, message)}")
def error(self, message: str):
"""Print an error message with red cross."""
icon = self._c(_RED + _BOLD, "")
print(f" {icon} {self._c(_RED, message)}", file=sys.stderr)
def warning(self, message: str):
"""Print a warning message with yellow triangle."""
icon = self._c(_YELLOW + _BOLD, "")
print(f" {icon} {self._c(_YELLOW, message)}")
def info(self, message: str):
"""Print an info message with blue dot."""
icon = self._c(_BLUE, "")
print(f" {icon} {self._c(_LIGHT_GRAY, message)}")
def hint(self, message: str):
"""Print a subtle hint message."""
print(f" {self._c(_DARK_GRAY, message)}")
def section(self, title: str):
"""Print a section header."""
print()
print(f" {self._c(self.accent + _BOLD, title)}")
print(f" {self._c(_DARK_GRAY, _H_LINE * len(title))}")
# ── Status display ────────────────────────────────────────────────
def status(self, label: str, value: str):
"""Print a key-value status line."""
lbl = self._c(_GRAY, f" {label}:")
val = self._c(_WHITE, f" {value}")
print(f"{lbl}{val}")
def status_block(self, items: dict[str, str], title: str = ""):
"""Print a block of status key-value pairs.
Args:
items: Dict of label -> value pairs.
title: Optional title for the block.
"""
if title:
self.section(title)
max_key = max(len(k) for k in items) if items else 0
for label, value in items.items():
lbl = self._c(_GRAY, f" {label:<{max_key}}")
val = self._c(_WHITE, f" {value}")
print(f"{lbl}{val}")
def progress(self, current: int, total: int, label: str = ""):
"""Print a simple progress indicator.
Args:
current: Current step number.
total: Total number of steps.
label: Optional label for the progress.
"""
pct = int(current / total * 100) if total > 0 else 0
bar_width = 20
filled = int(bar_width * current / total) if total > 0 else 0
bar = "" * filled + "" * (bar_width - filled)
text = f" {self._c(_CYAN, bar)} {self._c(_GRAY, f'{pct:3d}%')}"
if label:
text += f" {self._c(_LIGHT_GRAY, label)}"
print(text)
# ── Table display ─────────────────────────────────────────────────
def table(self, headers: list[str], rows: list[list[str]],
max_col_width: int = 40):
"""Print a formatted table with box-drawing characters.
Args:
headers: Column header strings.
rows: List of rows, each a list of cell strings.
max_col_width: Maximum column width before truncation.
"""
if not headers:
return
# Calculate column widths
col_widths = [min(len(h), max_col_width) for h in headers]
for row in rows:
for i, cell in enumerate(row):
if i < len(col_widths):
col_widths[i] = min(
max(col_widths[i], len(str(cell))), max_col_width
)
def pad(text: str, width: int) -> str:
t = str(text)[:width]
return t + " " * (width - len(t))
# Header
header_cells = [
self._c(_CYAN + _BOLD, pad(h, col_widths[i]))
for i, h in enumerate(headers)
]
sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
header_line = f" {sep.join(header_cells)}"
print(header_line)
# Separator
sep_parts = [self._c(_DARK_GRAY, _H_LINE * w) for w in col_widths]
sep_line = self._c(_DARK_GRAY, f" {'───'.join([_H_LINE * w for w in col_widths])}")
print(sep_line)
# Rows
for row in rows:
cells = []
for i, cell in enumerate(row):
if i < len(col_widths):
cells.append(self._c(_LIGHT_GRAY, pad(str(cell), col_widths[i])))
row_sep = self._c(_DARK_GRAY, f" {_V_LINE} ")
print(f" {row_sep.join(cells)}")
# ── Help display ──────────────────────────────────────────────────
def help(self, commands: dict[str, str]):
"""Print a formatted help listing.
Args:
commands: Dict of command -> description pairs.
"""
self.section("Commands")
max_cmd = max(len(c) for c in commands) if commands else 0
for cmd, desc in commands.items():
cmd_styled = self._c(self.accent, f" {cmd:<{max_cmd}}")
desc_styled = self._c(_GRAY, f" {desc}")
print(f"{cmd_styled}{desc_styled}")
print()
# ── Goodbye ───────────────────────────────────────────────────────
def print_goodbye(self):
"""Print a styled goodbye message."""
print(f"\n {_ICON_SMALL} {self._c(_GRAY, 'Goodbye!')}\n")
# ── Prompt toolkit session factory ────────────────────────────────
def create_prompt_session(self):
"""Create a prompt_toolkit PromptSession with skin styling.
Returns:
A configured PromptSession, or None if prompt_toolkit unavailable.
"""
try:
from prompt_toolkit import PromptSession
from prompt_toolkit.history import FileHistory
from prompt_toolkit.auto_suggest import AutoSuggestFromHistory
from prompt_toolkit.formatted_text import FormattedText
style = self.get_prompt_style()
session = PromptSession(
history=FileHistory(self.history_file),
auto_suggest=AutoSuggestFromHistory(),
style=style,
enable_history_search=True,
)
return session
except ImportError:
return None
def get_input(self, pt_session, project_name: str = "",
modified: bool = False, context: str = "") -> str:
"""Get input from user using prompt_toolkit or fallback.
Args:
pt_session: A prompt_toolkit PromptSession (or None).
project_name: Current project name.
modified: Whether project has unsaved changes.
context: Optional context string.
Returns:
User input string (stripped).
"""
if pt_session is not None:
from prompt_toolkit.formatted_text import FormattedText
tokens = self.prompt_tokens(project_name, modified, context)
return pt_session.prompt(FormattedText(tokens)).strip()
else:
raw_prompt = self.prompt(project_name, modified, context)
return input(raw_prompt).strip()
# ── Toolbar builder ───────────────────────────────────────────────
def bottom_toolbar(self, items: dict[str, str]):
"""Create a bottom toolbar callback for prompt_toolkit.
Args:
items: Dict of label -> value pairs to show in toolbar.
Returns:
A callable that returns FormattedText for the toolbar.
"""
def toolbar():
from prompt_toolkit.formatted_text import FormattedText
parts = []
for i, (k, v) in enumerate(items.items()):
if i > 0:
parts.append(("class:bottom-toolbar.text", ""))
parts.append(("class:bottom-toolbar.text", f" {k}: "))
parts.append(("class:bottom-toolbar", v))
return FormattedText(parts)
return toolbar
# ── ANSI 256-color to hex mapping (for prompt_toolkit styles) ─────────
_ANSI_256_TO_HEX = {
"\033[38;5;33m": "#0087ff", # audacity navy blue
"\033[38;5;35m": "#00af5f", # shotcut teal
"\033[38;5;39m": "#00afff", # inkscape bright blue
"\033[38;5;40m": "#00d700", # libreoffice green
"\033[38;5;55m": "#5f00af", # obs purple
"\033[38;5;69m": "#5f87ff", # kdenlive slate blue
"\033[38;5;75m": "#5fafff", # default sky blue
"\033[38;5;80m": "#5fd7d7", # brand cyan
"\033[38;5;208m": "#ff8700", # blender deep orange
"\033[38;5;214m": "#ffaf00", # gimp warm orange
"\033[38;5;255m": "#eeeeee", # ollama white
}

View File

@@ -0,0 +1,87 @@
"""VideoCaptioner CLI backend — subprocess wrapper for the videocaptioner command.
All core modules call through this single module to invoke the existing
videocaptioner CLI. This keeps the Click harness thin and delegates real
work to the production-tested videocaptioner package.
"""
import json
import subprocess
import shutil
from pathlib import Path
from typing import Any
def _find_vc() -> str:
"""Locate the videocaptioner binary."""
path = shutil.which("videocaptioner")
if not path:
raise RuntimeError(
"videocaptioner not found on PATH. "
"Install with: pip install videocaptioner"
)
return path
def run(args: list[str], timeout: int = 600) -> dict[str, Any]:
"""Run a videocaptioner CLI command and return structured result.
Args:
args: Command arguments (without 'videocaptioner' prefix).
timeout: Max seconds to wait.
Returns:
Dict with 'exit_code', 'stdout', 'stderr', 'output_path' (if found).
"""
cmd = [_find_vc()] + args
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout,
)
except subprocess.TimeoutExpired:
raise RuntimeError(f"Command timed out after {timeout}s: {' '.join(cmd)}")
# Extract output path from quiet mode stdout
stdout = result.stdout.strip()
output_path = stdout if stdout and Path(stdout).suffix else None
return {
"exit_code": result.returncode,
"stdout": stdout,
"stderr": result.stderr.strip(),
"output_path": output_path,
"command": " ".join(cmd),
}
def run_quiet(args: list[str], timeout: int = 600) -> str:
"""Run in quiet mode and return the output file path.
Raises RuntimeError on failure.
"""
result = run(args + ["-q"], timeout=timeout)
if result["exit_code"] != 0:
error_msg = result["stderr"] or result["stdout"] or "Unknown error"
raise RuntimeError(f"videocaptioner failed (exit {result['exit_code']}): {error_msg}")
return result["stdout"]
def get_version() -> str:
"""Get videocaptioner version string."""
result = run(["--version"])
return result["stdout"]
def get_config() -> str:
"""Get current configuration."""
result = run(["config", "show"])
return result["stdout"]
def get_styles() -> str:
"""Get available subtitle styles."""
result = run(["style"])
return result["stdout"]

View File

@@ -0,0 +1,362 @@
#!/usr/bin/env python3
"""VideoCaptioner CLI — AI-powered video captioning from the command line.
Transcribe speech, optimize and translate subtitles, then burn them into
video with beautiful customizable styles (ASS outline or rounded background).
Usage:
cli-anything-videocaptioner transcribe video.mp4 --asr bijian
cli-anything-videocaptioner subtitle input.srt --translator bing --target-language en
cli-anything-videocaptioner synthesize video.mp4 -s sub.srt --subtitle-mode hard --style anime
cli-anything-videocaptioner process video.mp4 --asr bijian --translator bing --target-language ja
cli-anything-videocaptioner --json transcribe video.mp4 --asr bijian
"""
import sys
import json
import shlex
import click
from typing import Optional
from cli_anything.videocaptioner.utils import vc_backend
from cli_anything.videocaptioner.core import transcribe as transcribe_mod
from cli_anything.videocaptioner.core import subtitle as subtitle_mod
from cli_anything.videocaptioner.core import synthesize as synthesize_mod
from cli_anything.videocaptioner.core import pipeline as pipeline_mod
_json_output = False
_repl_mode = False
def output(data, message: str = ""):
if _json_output:
click.echo(json.dumps(data, indent=2, default=str))
else:
if message:
click.echo(message)
if isinstance(data, dict):
for k, v in data.items():
click.echo(f" {k}: {v}")
elif isinstance(data, str):
click.echo(data)
def handle_error(func):
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except RuntimeError as e:
if _json_output:
click.echo(json.dumps({"error": str(e), "type": "runtime_error"}))
else:
click.echo(f"Error: {e}", err=True)
if not _repl_mode:
sys.exit(1)
wrapper.__name__ = func.__name__
wrapper.__doc__ = func.__doc__
return wrapper
# ── Main CLI Group ──────────────────────────────────────────────
@click.group(invoke_without_command=True)
@click.option("--json", "use_json", is_flag=True, help="Output as JSON")
@click.pass_context
def cli(ctx, use_json):
"""VideoCaptioner CLI — AI-powered video captioning.
Transcribe speech, optimize/translate subtitles, burn into video with
beautiful styles. Free ASR (bijian) and translation (Bing/Google) included.
Run without a subcommand to enter interactive REPL mode.
"""
global _json_output
_json_output = use_json
if ctx.invoked_subcommand is None:
ctx.invoke(repl)
# ── Transcribe ──────────────────────────────────────────────────
@cli.command()
@click.argument("input_path")
@click.option("--asr", type=click.Choice(["bijian", "jianying", "whisper-api", "whisper-cpp"]),
default="bijian", help="ASR engine (bijian/jianying: free, Chinese & English only)")
@click.option("--language", default="auto", help="Source language ISO 639-1 code, or 'auto'")
@click.option("--format", "fmt", type=click.Choice(["srt", "ass", "txt", "json"]),
default="srt", help="Output format")
@click.option("-o", "--output", "output_path", default=None, help="Output file or directory path")
@click.option("--word-timestamps", is_flag=True, help="Include word-level timestamps")
@click.option("--whisper-api-key", default=None, help="Whisper API key")
@click.option("--whisper-api-base", default=None, help="Whisper API base URL")
@click.option("--whisper-model", default=None, help="Whisper model name")
@handle_error
def transcribe(input_path, asr, language, fmt, output_path, word_timestamps,
whisper_api_key, whisper_api_base, whisper_model):
"""Transcribe audio/video to subtitles."""
result_path = transcribe_mod.transcribe(
input_path, output_path=output_path, asr=asr, language=language,
format=fmt, word_timestamps=word_timestamps,
whisper_api_key=whisper_api_key, whisper_api_base=whisper_api_base,
whisper_model=whisper_model,
)
output({"output_path": result_path}, f"✓ Transcription complete → {result_path}")
# ── Subtitle ────────────────────────────────────────────────────
@cli.command()
@click.argument("input_path")
@click.option("--translator", type=click.Choice(["llm", "bing", "google"]),
default=None, help="Translation service (bing/google: free)")
@click.option("--target-language", default=None, help="Target language BCP 47 code (e.g. en, ja, ko)")
@click.option("--format", "fmt", type=click.Choice(["srt", "ass", "txt", "json"]),
default="srt", help="Output format")
@click.option("-o", "--output", "output_path", default=None, help="Output file or directory path")
@click.option("--layout", type=click.Choice(["target-above", "source-above", "target-only", "source-only"]),
default=None, help="Bilingual subtitle layout")
@click.option("--no-optimize", is_flag=True, help="Skip LLM optimization")
@click.option("--no-translate", is_flag=True, help="Skip translation")
@click.option("--no-split", is_flag=True, help="Skip re-segmentation")
@click.option("--reflect", is_flag=True, help="Reflective translation (LLM only, higher quality)")
@click.option("--prompt", default=None, help="Custom LLM prompt")
@click.option("--api-key", default=None, help="LLM API key")
@click.option("--api-base", default=None, help="LLM API base URL")
@click.option("--model", default=None, help="LLM model name")
@handle_error
def subtitle(input_path, translator, target_language, fmt, output_path, layout,
no_optimize, no_translate, no_split, reflect, prompt, api_key, api_base, model):
"""Optimize and/or translate subtitle files.
Three processing steps (all enabled by default except translation):
1. Split — re-segment by semantic boundaries (LLM)
2. Optimize — fix ASR errors, punctuation (LLM)
3. Translate — to another language (LLM/Bing/Google)
Use --translator or --target-language to enable translation.
"""
result_path = subtitle_mod.process_subtitle(
input_path, output_path=output_path, translator=translator,
target_language=target_language, format=fmt, layout=layout,
no_optimize=no_optimize, no_translate=no_translate, no_split=no_split,
reflect=reflect, prompt=prompt, api_key=api_key, api_base=api_base, model=model,
)
output({"output_path": result_path}, f"✓ Subtitle processing complete → {result_path}")
# ── Synthesize ──────────────────────────────────────────────────
@cli.command()
@click.argument("video_path")
@click.option("-s", "--subtitle", "subtitle_path", required=True, help="Subtitle file path")
@click.option("--subtitle-mode", type=click.Choice(["soft", "hard"]),
default="soft", help="soft: embedded track, hard: burned into frames")
@click.option("--quality", type=click.Choice(["ultra", "high", "medium", "low"]),
default="medium", help="Video quality (ultra=CRF18, high=CRF23, medium=CRF28, low=CRF32)")
@click.option("-o", "--output", "output_path", default=None, help="Output video file path")
@click.option("--layout", type=click.Choice(["target-above", "source-above", "target-only", "source-only"]),
default=None, help="Bilingual subtitle layout")
@click.option("--render-mode", type=click.Choice(["ass", "rounded"]),
default=None, help="ass: outline/shadow, rounded: background boxes")
@click.option("--style", default=None, help="Style preset (default, anime, vertical, rounded)")
@click.option("--style-override", default=None, help='Inline JSON, e.g. \'{"outline_color": "#ff0000"}\'')
@click.option("--font-file", default=None, help="Custom font file (.ttf/.otf)")
@handle_error
def synthesize(video_path, subtitle_path, subtitle_mode, quality, output_path,
layout, render_mode, style, style_override, font_file):
"""Burn subtitles into video with customizable styles.
Two rendering modes for beautiful subtitles:
ASS — traditional outline/shadow (presets: default, anime, vertical)
Rounded — modern rounded background boxes
Use 'cli-anything-videocaptioner styles' to see all presets.
"""
result_path = synthesize_mod.synthesize(
video_path, subtitle_path, output_path=output_path,
subtitle_mode=subtitle_mode, quality=quality, layout=layout,
render_mode=render_mode, style=style, style_override=style_override,
font_file=font_file,
)
output({"output_path": result_path}, f"✓ Video synthesis complete → {result_path}")
# ── Process (full pipeline) ─────────────────────────────────────
@cli.command()
@click.argument("input_path")
@click.option("--asr", type=click.Choice(["bijian", "jianying", "whisper-api", "whisper-cpp"]),
default="bijian", help="ASR engine")
@click.option("--language", default="auto", help="Source language")
@click.option("--translator", type=click.Choice(["llm", "bing", "google"]),
default=None, help="Translation service (bing/google: free)")
@click.option("--target-language", default=None, help="Target language BCP 47 code")
@click.option("--subtitle-mode", type=click.Choice(["soft", "hard"]), default="soft")
@click.option("--quality", type=click.Choice(["ultra", "high", "medium", "low"]), default="medium")
@click.option("-o", "--output", "output_path", default=None, help="Output file or directory path")
@click.option("--layout", type=click.Choice(["target-above", "source-above", "target-only", "source-only"]), default=None)
@click.option("--style", default=None, help="Style preset name")
@click.option("--style-override", default=None, help="Inline JSON style override")
@click.option("--render-mode", type=click.Choice(["ass", "rounded"]), default=None)
@click.option("--no-optimize", is_flag=True, help="Skip optimization")
@click.option("--no-translate", is_flag=True, help="Skip translation")
@click.option("--no-split", is_flag=True, help="Skip re-segmentation")
@click.option("--no-synthesize", is_flag=True, help="Skip video synthesis")
@click.option("--reflect", is_flag=True, help="Reflective translation (LLM only)")
@click.option("--prompt", default=None, help="Custom LLM prompt")
@click.option("--api-key", default=None, help="LLM API key")
@click.option("--api-base", default=None, help="LLM API base URL")
@click.option("--model", default=None, help="LLM model name")
@handle_error
def process(input_path, asr, language, translator, target_language, subtitle_mode,
quality, output_path, layout, style, style_override, render_mode,
no_optimize, no_translate, no_split, no_synthesize, reflect,
prompt, api_key, api_base, model):
"""Full pipeline: transcribe → optimize → translate → synthesize.
One command to go from video to captioned video with translated subtitles.
Audio files automatically skip video synthesis.
"""
result_path = pipeline_mod.process(
input_path, output_path=output_path, asr=asr, language=language,
translator=translator, target_language=target_language,
subtitle_mode=subtitle_mode, quality=quality, layout=layout,
style=style, style_override=style_override, render_mode=render_mode,
no_optimize=no_optimize, no_translate=no_translate, no_split=no_split,
no_synthesize=no_synthesize, reflect=reflect, prompt=prompt,
api_key=api_key, api_base=api_base, model=model,
)
output({"output_path": result_path}, f"✓ Pipeline complete → {result_path}")
# ── Styles ──────────────────────────────────────────────────────
@cli.command()
@handle_error
def styles():
"""List available subtitle style presets."""
result = vc_backend.get_styles()
if _json_output:
click.echo(json.dumps({"styles": result}))
else:
click.echo(result)
# ── Config ──────────────────────────────────────────────────────
@cli.group()
def config():
"""View and manage configuration."""
pass
@config.command("show")
@handle_error
def config_show():
"""Display current configuration."""
result = vc_backend.get_config()
if _json_output:
click.echo(json.dumps({"config": result}))
else:
click.echo(result)
@config.command("set")
@click.argument("key")
@click.argument("value")
@handle_error
def config_set(key, value):
"""Set a configuration value."""
result = vc_backend.run(["config", "set", key, value])
if result["exit_code"] != 0:
raise RuntimeError(result["stderr"] or result["stdout"])
output({"key": key, "value": value}, f"{key} = {value}")
# ── Download ────────────────────────────────────────────────────
@cli.command()
@click.argument("url")
@click.option("-o", "--output", "output_dir", default=".", help="Output directory")
@handle_error
def download(url, output_dir):
"""Download online video (YouTube, Bilibili, etc.)."""
result_path = vc_backend.run_quiet(["download", url, "-o", output_dir])
output({"output_path": result_path}, f"✓ Downloaded → {result_path}")
# ── Session ─────────────────────────────────────────────────────
@cli.group()
def session():
"""Session state commands."""
pass
@session.command("status")
@handle_error
def session_status():
"""Show VideoCaptioner version and configuration."""
version = vc_backend.get_version()
data = {"version": version, "json_output": _json_output}
output(data, f"VideoCaptioner {version}")
# ── REPL ────────────────────────────────────────────────────────
@cli.command()
@handle_error
def repl():
"""Start interactive REPL session."""
from cli_anything.videocaptioner.utils.repl_skin import ReplSkin
global _repl_mode
_repl_mode = True
skin = ReplSkin("videocaptioner", version="1.0.0")
skin.print_banner()
pt_session = skin.create_prompt_session()
_repl_commands = {
"transcribe": "Transcribe audio/video to subtitles",
"subtitle": "Optimize and/or translate subtitles",
"synthesize": "Burn subtitles into video",
"process": "Full pipeline (transcribe → translate → synthesize)",
"styles": "List subtitle style presets",
"config": "show|set <key> <value>",
"download": "Download online video",
"session": "status",
"help": "Show this help",
"quit": "Exit REPL",
}
while True:
try:
line = skin.get_input(pt_session, project_name="", modified=False)
if not line:
continue
if line.lower() in ("quit", "exit", "q"):
skin.print_goodbye()
break
if line.lower() == "help":
skin.help(_repl_commands)
continue
try:
args = shlex.split(line)
except ValueError:
args = line.split()
try:
cli.main(args, standalone_mode=False)
except SystemExit:
pass
except click.exceptions.UsageError as e:
skin.warning(f"Usage error: {e}")
except Exception as e:
skin.error(f"{e}")
except (EOFError, KeyboardInterrupt):
skin.print_goodbye()
break
_repl_mode = False
# ── Entry Point ─────────────────────────────────────────────────
def main():
cli()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,51 @@
#!/usr/bin/env python3
"""setup.py for cli-anything-videocaptioner"""
from setuptools import setup, find_namespace_packages
with open("cli_anything/videocaptioner/README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()
setup(
name="cli-anything-videocaptioner",
version="1.0.0",
author="Weifeng",
author_email="",
description="CLI harness for VideoCaptioner — AI-powered video captioning with beautiful subtitle styles. Requires: videocaptioner (pip install videocaptioner), ffmpeg",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/WEIFENG2333/VideoCaptioner",
packages=find_namespace_packages(include=["cli_anything.*"]),
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Topic :: Multimedia :: Video",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
],
python_requires=">=3.10",
install_requires=[
"click>=8.0.0",
"prompt-toolkit>=3.0.0",
"videocaptioner",
],
extras_require={
"dev": [
"pytest>=7.0.0",
"pytest-cov>=4.0.0",
],
},
entry_points={
"console_scripts": [
"cli-anything-videocaptioner=cli_anything.videocaptioner.videocaptioner_cli:main",
],
},
package_data={
"cli_anything.videocaptioner": ["skills/*.md"],
},
include_package_data=True,
zip_safe=False,
)