mirror of
https://fastgit.cc/github.com/Yeachan-Heo/oh-my-claudecode
synced 2026-04-30 13:51:15 +08:00
Extend the existing harsh-critic benchmark framework with reusable benchmarks for code-reviewer, debugger, and executor agents. Enables measurable prompt tuning by comparing old (pre-consolidation) vs new (merged) prompts with ground-truth scoring. New infrastructure: - benchmarks/shared/ — generalized scoring types, parser, reporter, runner - benchmarks/code-reviewer/ — 3 fixtures (SQL injection, clean code, payment edge cases) - benchmarks/debugger/ — 3 fixtures (React undefined, Redis intermittent, TS build errors) - benchmarks/executor/ — 3 fixtures (trivial, scoped, complex tasks) - benchmarks/run-all.ts — top-level orchestrator with --save-baseline and --compare modes - npm scripts: bench:prompts, bench:prompts:save, bench:prompts:compare Each benchmark includes archived pre-consolidation prompts for reproducible comparison even after old agent files are deleted. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
34 lines
1.4 KiB
JSON
34 lines
1.4 KiB
JSON
{
|
|
"fixtureId": "task-add-timestamp",
|
|
"fixturePath": "fixtures/tasks/task-add-timestamp.md",
|
|
"domain": "task",
|
|
"expectedVerdict": "trivial",
|
|
"isCleanBaseline": false,
|
|
"findings": [
|
|
{
|
|
"id": "IMPL-TS-1",
|
|
"severity": "CRITICAL",
|
|
"category": "finding",
|
|
"summary": "Must add createdAt: Date field to the User interface in src/types/user.ts",
|
|
"keywords": ["createdAt", "User", "interface", "Date", "field"],
|
|
"explanation": "The User interface needs a new createdAt: Date property. This is the type-level change required."
|
|
},
|
|
{
|
|
"id": "IMPL-TS-2",
|
|
"severity": "CRITICAL",
|
|
"category": "finding",
|
|
"summary": "Must set createdAt: new Date() in the createUser function",
|
|
"keywords": ["createdAt", "new Date", "createUser", "set"],
|
|
"explanation": "The createUser function must set createdAt to new Date() when constructing the user object."
|
|
},
|
|
{
|
|
"id": "IMPL-TS-3",
|
|
"severity": "MAJOR",
|
|
"category": "finding",
|
|
"summary": "Scope should be minimal — only User interface and createUser function need changes",
|
|
"keywords": ["scope", "minimal", "only", "two files", "interface", "service"],
|
|
"explanation": "This is a trivial task. Only two locations need modification: the type definition and the service function. Routes, other services, and tests should not need changes for this addition."
|
|
}
|
|
]
|
|
}
|