oh-my-claudecode/benchmark/run_vanilla.sh at df0f27071f99da146cab707c9cca90d502e72cfc

mirror of https://fastgit.cc/github.com/Yeachan-Heo/oh-my-claudecode synced 2026-04-20 21:00:50 +08:00

Files

Yeachan-Heo df0f27071f Keep agent guidance and benchmark defaults aligned with OMC

The root AGENTS contract had drifted toward OMX/Codex wording and state
paths, which made the project-level guidance inconsistent with the actual
OMC runtime. The benchmark suite also carried split default model strings
across shell wrappers, the Python runner, and results docs, so this cleanup
re-aligned the suite on one current Sonnet 4.6 default and added a narrow
contract test to catch future regressions.

Constraint: Limit the cleanup to stale OMC-vs-OMX references and benchmark model strings
Rejected: Regenerate broader docs/templates wholesale | unnecessary scope for a targeted cleanup issue
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep root AGENTS branding/state paths in sync with OMC runtime contracts and update benchmark defaults in one place when the benchmark model changes
Tested: ./node_modules/.bin/vitest run src/__tests__/tier0-docs-consistency.test.ts src/__tests__/hooks.test.ts src/config/__tests__/loader.test.ts
Tested: python3 -m py_compile benchmark/run_benchmark.py
Tested: bash -n benchmark/quick_test.sh benchmark/run_vanilla.sh benchmark/run_omc.sh benchmark/run_full_comparison.sh
Not-tested: Full benchmark execution against live Anthropic/SWE-bench infrastructure

2026-03-21 01:10:33 +00:00

3.5 KiB

Executable File

Raw Blame History

View Raw

3.5 KiB Executable File Raw Blame History

3.5 KiB

Executable File

Raw Blame History