oh-my-claudecode

mirror of https://fastgit.cc/github.com/Yeachan-Heo/oh-my-claudecode synced 2026-04-20 21:00:50 +08:00

Files

Bellman 3a833c3395 feat(benchmarks): add per-agent prompt benchmark suite for all 4 consolidated agents (#1437 )

Extend the existing harsh-critic benchmark framework with reusable
benchmarks for code-reviewer, debugger, and executor agents. Enables
measurable prompt tuning by comparing old (pre-consolidation) vs new
(merged) prompts with ground-truth scoring.

New infrastructure:
- benchmarks/shared/ — generalized scoring types, parser, reporter, runner
- benchmarks/code-reviewer/ — 3 fixtures (SQL injection, clean code, payment edge cases)
- benchmarks/debugger/ — 3 fixtures (React undefined, Redis intermittent, TS build errors)
- benchmarks/executor/ — 3 fixtures (trivial, scoped, complex tasks)
- benchmarks/run-all.ts — top-level orchestrator with --save-baseline and --compare modes
- npm scripts: bench:prompts, bench:prompts:save, bench:prompts:compare

Each benchmark includes archived pre-consolidation prompts for reproducible
comparison even after old agent files are deleted.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 15:13:08 +09:00

baselines

feat(benchmarks): add per-agent prompt benchmark suite for all 4 consolidated agents (#1437 )

2026-03-08 15:13:08 +09:00

code-reviewer

feat(benchmarks): add per-agent prompt benchmark suite for all 4 consolidated agents (#1437 )

2026-03-08 15:13:08 +09:00

debugger

feat(benchmarks): add per-agent prompt benchmark suite for all 4 consolidated agents (#1437 )

2026-03-08 15:13:08 +09:00

executor

feat(benchmarks): add per-agent prompt benchmark suite for all 4 consolidated agents (#1437 )