Keep agent guidance and benchmark defaults aligned with OMC

The root AGENTS contract had drifted toward OMX/Codex wording and state paths, which made the project-level guidance inconsistent with the actual OMC runtime. The benchmark suite also carried split default model strings across shell wrappers, the Python runner, and results docs, so this cleanup re-aligned the suite on one current Sonnet 4.6 default and added a narrow contract test to catch future regressions. Constraint: Limit the cleanup to stale OMC-vs-OMX references and benchmark model strings Rejected: Regenerate broader docs/templates wholesale | unnecessary scope for a targeted cleanup issue Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep root AGENTS branding/state paths in sync with OMC runtime contracts and update benchmark defaults in one place when the benchmark model changes Tested: ./node_modules/.bin/vitest run src/__tests__/tier0-docs-consistency.test.ts src/__tests__/hooks.test.ts src/config/__tests__/loader.test.ts Tested: python3 -m py_compile benchmark/run_benchmark.py Tested: bash -n benchmark/quick_test.sh benchmark/run_vanilla.sh benchmark/run_omc.sh benchmark/run_full_comparison.sh Not-tested: Full benchmark execution against live Anthropic/SWE-bench infrastructure
2026-04-20 12:51:30 +08:00 · 2026-03-21 01:10:33 +00:00
parent 2fe1d7c59d
commit df0f27071f
11 changed files with 68 additions and 34 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,6 +1,6 @@
-# oh-my-codex - Intelligent Multi-Agent Orchestration
+# oh-my-claudecode - Intelligent Multi-Agent Orchestration

-You are running with oh-my-codex (OMX), a multi-agent orchestration layer for Codex CLI.
+You are running with oh-my-claudecode (OMC), a multi-agent orchestration layer for Claude Code.
 Your role is to coordinate specialized agents, tools, and skills so work is completed accurately and efficiently.

 <guidance_schema_contract>
@@ -60,7 +60,7 @@ For non-trivial SDK/API/framework usage, delegate to `dependency-expert` to chec
 </delegation_rules>

 <child_agent_protocol>
-Codex CLI spawns child agents via the `spawn_agent` tool (requires `multi_agent = true`).
+Claude Code spawns child agents via the `spawn_agent` tool (requires `multi_agent = true`).
 To inject role-specific behavior, the parent MUST read the role prompt and pass it in the spawned agent message.

 Delegation steps:
@@ -90,7 +90,7 @@ Key constraints:
 </child_agent_protocol>

 <invocation_conventions>
-Codex CLI uses these prefixes for custom commands:
+Claude Code uses these prefixes for custom commands:
 - `/prompts:name` — invoke a custom prompt (e.g., `/prompts:architect "review auth module"`)
 - `$name` — invoke a skill (e.g., `$ralph "fix all tests"`, `$autopilot "build REST API"`)
 - `/skills` — browse available skills interactively
@@ -113,7 +113,7 @@ For workflow skills: `$name` (e.g., `$ralph "fix all tests"`)
 ---

 <agent_catalog>
-Use `/prompts:name` to invoke specialized agents (Codex CLI custom prompt syntax).
+Use `/prompts:name` to invoke specialized agents (Claude Code custom prompt syntax).

 Build/Analysis Lane:
 - `/prompts:explore`: Fast codebase search, file/symbol mapping
@@ -181,7 +181,7 @@ Detection rules:

 Ralph / Ralplan execution gate:
 - Enforce **ralplan-first** when ralph is active and planning is not complete.
- Planning is complete only after both `.omx/plans/prd-*.md` and `.omx/plans/test-spec-*.md` exist.
+- Planning is complete only after both `.omc/plans/prd-*.md` and `.omc/plans/test-spec-*.md` exist.
 - Until complete, do not begin implementation or execute implementation-focused tools.
 </keyword_detection>

@@ -270,10 +270,12 @@ Resume: detect existing team state and resume from the last incomplete stage.
 <team_model_resolution>
 Team/Swarm worker startup currently uses one shared `agentType` and one shared launch-arg set for all workers in a team run.

-For worker model selection, apply this precedence (highest to lowest):
-1. Explicit model already present in `OMX_TEAM_WORKER_LAUNCH_ARGS`
-2. Inherited leader `--model` (when inheritance is enabled)
-3. Injected low-complexity default model: `gpt-5.3-codex-spark` (only when 1+2 are absent and team `agentType` is low-complexity)
+For Claude worker model selection, apply this precedence (highest to lowest):
+1. Explicit `--model` already present in worker launch args
+2. Direct provider model env (`ANTHROPIC_MODEL` / `CLAUDE_MODEL`)
+3. Provider tier envs (`CLAUDE_CODE_BEDROCK_SONNET_MODEL`, `ANTHROPIC_DEFAULT_SONNET_MODEL`)
+4. OMC tier env (`OMC_MODEL_MEDIUM`)
+5. Otherwise let Claude Code use its default model

 Model flag normalization contract:
 - Accept both `--model <value>` and `--model=<value>`
@@ -315,7 +317,7 @@ Anti-slop workflow:

 Visual iteration gate:
 - For visual tasks (reference image(s) + generated screenshot), run `$visual-verdict` every iteration before the next edit.
- Persist visual verdict JSON in `.omx/state/{scope}/ralph-progress.json` with both numeric (`score`, threshold pass/fail) and qualitative (`reasoning`, `differences`, `suggestions`, `next_actions`) feedback.
+- Persist visual verdict JSON in `.omc/state/{scope}/ralph-progress.json` with both numeric (`score`, threshold pass/fail) and qualitative (`reasoning`, `differences`, `suggestions`, `next_actions`) feedback.

 Continuation:
  Before concluding, confirm: zero pending tasks, all features working, tests passing, zero errors, verification evidence collected. If any item is unchecked, continue working.
@@ -340,14 +342,14 @@ When not to cancel:
 ---

 <state_management>
-oh-my-codex uses the `.omx/` directory for persistent state:
- `.omx/state/` -- Mode state files (JSON)
- `.omx/notepad.md` -- Session-persistent notes
- `.omx/project-memory.json` -- Cross-session project knowledge
- `.omx/plans/` -- Planning documents
- `.omx/logs/` -- Audit logs
+oh-my-claudecode uses the `.omc/` directory for persistent state:
+- `.omc/state/` -- Mode state files (JSON)
+- `.omc/notepad.md` -- Session-persistent notes
+- `.omc/project-memory.json` -- Cross-session project knowledge
+- `.omc/plans/` -- Planning documents
+- `.omc/logs/` -- Audit logs

-Tools are available via MCP when configured (`omx setup` registers all servers):
+Tools are available via MCP when configured (`omc setup` registers all servers):

 State & Memory:
 - `state_read`, `state_write`, `state_clear`, `state_list_active`, `state_get_status`
@@ -388,7 +390,7 @@ Recommended mode fields:

 ## Setup

-Run `omx setup` to install all components. Run `omx doctor` to verify installation.
+Run `omc setup` to install all components. Run `omc doctor` to verify installation.

 ---

--- a/benchmark/README.md
+++ b/benchmark/README.md
@@ -63,7 +63,7 @@ Run vanilla Claude Code benchmark:
 **Options:**
 - `--limit N` - Limit to N instances (default: all)
 - `--skip N` - Skip first N instances (default: 0)
- `--model MODEL` - Claude model to use (default: claude-sonnet-4.5-20250929)
+- `--model MODEL` - Claude model to use (default: claude-sonnet-4-6-20260217)
 - `--timeout SECS` - Timeout per instance (default: 300)

 **Examples:**
--- a/benchmark/quick_test.sh
+++ b/benchmark/quick_test.sh
@@ -36,7 +36,7 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

 # Quick test configuration
 TEST_LIMIT=5
-MODEL="claude-sonnet-4.5-20250929"
+MODEL="claude-sonnet-4-6-20260217"
 TIMEOUT="180"  # 3 minutes per instance for quick test

 # Parse arguments
@@ -61,7 +61,7 @@ while [[ $# -gt 0 ]]; do
            echo ""
            echo "Options:"
            echo "  --limit N       Number of instances to test (default: 5)"
-            echo "  --model MODEL   Claude model to use (default: claude-sonnet-4.5-20250929)"
+            echo "  --model MODEL   Claude model to use (default: claude-sonnet-4-6-20260217)"
            echo "  --timeout SECS  Timeout per instance (default: 180)"
            echo "  -h, --help      Show this help message"
            exit 0
--- a/benchmark/results/README.md
+++ b/benchmark/results/README.md
@@ -19,7 +19,7 @@

 ### Evaluation Setup

- **Model:** Claude Sonnet 4 (claude-sonnet-4-20250514)
+- **Model:** Claude Sonnet 4.6 (claude-sonnet-4-6-20260217)
 - **Max Tokens:** 16,384 output tokens per instance
 - **Timeout:** 30 minutes per instance
 - **Workers:** 4 parallel evaluations
--- a/benchmark/run_benchmark.py
+++ b/benchmark/run_benchmark.py
@@ -58,7 +58,7 @@ class BenchmarkConfig:
    limit: Optional[int] = None
    retries: int = 3
    retry_delay: int = 30
-    model: str = "claude-sonnet-4-20250514"
+    model: str = "claude-sonnet-4-6-20260217"
    skip: int = 0


@@ -581,8 +581,8 @@ Examples:
    )
    parser.add_argument(
        "--model",
-        default="claude-sonnet-4-20250514",
-        help="Claude model to use (default: claude-sonnet-4-20250514)",
+        default="claude-sonnet-4-6-20260217",
+        help="Claude model to use (default: claude-sonnet-4-6-20260217)",
    )
    parser.add_argument(
        "--skip",
--- a/benchmark/run_full_comparison.sh
+++ b/benchmark/run_full_comparison.sh
@@ -37,7 +37,7 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 # Parse arguments
 LIMIT=""
 SKIP=""
-MODEL="claude-sonnet-4.5-20250929"
+MODEL="claude-sonnet-4-6-20260217"
 TIMEOUT="300"
 SKIP_VANILLA=false
 SKIP_OMC=false
@@ -81,7 +81,7 @@ while [[ $# -gt 0 ]]; do
            echo "Options:"
            echo "  --limit N         Limit to N instances (default: all)"
            echo "  --skip N          Skip first N instances (default: 0)"
-            echo "  --model MODEL     Claude model to use (default: claude-sonnet-4.5-20250929)"
+            echo "  --model MODEL     Claude model to use (default: claude-sonnet-4-6-20260217)"
            echo "  --timeout SECS    Timeout per instance (default: 300)"
            echo "  --skip-vanilla    Skip vanilla benchmark run"
            echo "  --skip-omc        Skip OMC benchmark run"
--- a/benchmark/run_omc.sh
+++ b/benchmark/run_omc.sh
@@ -38,7 +38,7 @@ LOG_FILE="$LOGS_DIR/${RUN_MODE}_${TIMESTAMP}.log"
 # Parse arguments
 LIMIT=""
 SKIP=""
-MODEL="claude-sonnet-4.5-20250929"
+MODEL="claude-sonnet-4-6-20260217"
 TIMEOUT="300"

 while [[ $# -gt 0 ]]; do
@@ -65,7 +65,7 @@ while [[ $# -gt 0 ]]; do
            echo "Options:"
            echo "  --limit N       Limit to N instances (default: all)"
            echo "  --skip N        Skip first N instances (default: 0)"
-            echo "  --model MODEL   Claude model to use (default: claude-sonnet-4.5-20250929)"
+            echo "  --model MODEL   Claude model to use (default: claude-sonnet-4-6-20260217)"
            echo "  --timeout SECS  Timeout per instance (default: 300)"
            echo "  -h, --help      Show this help message"
            exit 0
--- a/benchmark/run_vanilla.sh
+++ b/benchmark/run_vanilla.sh
@@ -38,7 +38,7 @@ LOG_FILE="$LOGS_DIR/${RUN_MODE}_${TIMESTAMP}.log"
 # Parse arguments
 LIMIT=""
 SKIP=""
-MODEL="claude-sonnet-4.5-20250929"
+MODEL="claude-sonnet-4-6-20260217"
 TIMEOUT="300"

 while [[ $# -gt 0 ]]; do
@@ -65,7 +65,7 @@ while [[ $# -gt 0 ]]; do
            echo "Options:"
            echo "  --limit N       Limit to N instances (default: all)"
            echo "  --skip N        Skip first N instances (default: 0)"
-            echo "  --model MODEL   Claude model to use (default: claude-sonnet-4.5-20250929)"
+            echo "  --model MODEL   Claude model to use (default: claude-sonnet-4-6-20260217)"
            echo "  --timeout SECS  Timeout per instance (default: 300)"
            echo "  -h, --help      Show this help message"
            exit 0
--- a/src/tests/hooks.test.ts
+++ b/src/tests/hooks.test.ts
@@ -577,7 +577,7 @@ describe('Team staged workflow integration', () => {
  });

  it('compacts OMC-style root AGENTS guidance on session-start without dropping key sections', async () => {
-    const agentsContent = `# oh-my-codex - Intelligent Multi-Agent Orchestration
+    const agentsContent = `# oh-my-claudecode - Intelligent Multi-Agent Orchestration

 <guidance_schema_contract>
 schema
--- a/src/tests/tier0-docs-consistency.test.ts
+++ b/src/tests/tier0-docs-consistency.test.ts
@@ -62,4 +62,36 @@ describe('Tier-0 contract docs consistency', () => {
    expect(claudeDoc).toContain('Team orchestration is explicit via `/team`.');
    expect(referenceDoc).not.toContain('| `team`, `coordinated team`');
  });
+
+
+  it('keeps root AGENTS.md aligned with OMC branding and state paths', () => {
+    const agentsDoc = readProjectFile('AGENTS.md');
+
+    expect(agentsDoc).toContain('# oh-my-claudecode - Intelligent Multi-Agent Orchestration');
+    expect(agentsDoc).toContain('You are running with oh-my-claudecode (OMC), a multi-agent orchestration layer for Claude Code.');
+    expect(agentsDoc).toContain('`.omc/state/`');
+    expect(agentsDoc).toContain('Run `omc setup` to install all components. Run `omc doctor` to verify installation.');
+    expect(agentsDoc).not.toContain('oh-my-codex');
+    expect(agentsDoc).not.toContain('OMX_TEAM_WORKER_LAUNCH_ARGS');
+    expect(agentsDoc).not.toContain('gpt-5.3-codex-spark');
+  });
+
+  it('keeps benchmark default model references aligned across docs and scripts', () => {
+    const benchmarkReadme = readProjectFile('benchmark', 'README.md');
+    const benchmarkRunner = readProjectFile('benchmark', 'run_benchmark.py');
+    const quickTest = readProjectFile('benchmark', 'quick_test.sh');
+    const vanilla = readProjectFile('benchmark', 'run_vanilla.sh');
+    const omc = readProjectFile('benchmark', 'run_omc.sh');
+    const fullComparison = readProjectFile('benchmark', 'run_full_comparison.sh');
+    const resultsReadme = readProjectFile('benchmark', 'results', 'README.md');
+    const expectedModel = 'claude-sonnet-4-6-20260217';
+
+    for (const content of [benchmarkReadme, benchmarkRunner, quickTest, vanilla, omc, fullComparison, resultsReadme]) {
+      expect(content).toContain(expectedModel);
+    }
+
+    expect(benchmarkReadme).not.toContain('claude-sonnet-4.5-20250929');
+    expect(benchmarkRunner).not.toContain('claude-sonnet-4-20250514');
+    expect(resultsReadme).toContain('Claude Sonnet 4.6');
+  });
 });
--- a/src/config/tests/loader.test.ts
+++ b/src/config/tests/loader.test.ts
@@ -141,7 +141,7 @@ describe("startup context compaction", () => {

    try {
      const omcAgentsPath = join(tempDir, "AGENTS.md");
-      const omcGuidance = `# oh-my-codex - Intelligent Multi-Agent Orchestration
+      const omcGuidance = `# oh-my-claudecode - Intelligent Multi-Agent Orchestration

 <guidance_schema_contract>
 schema