mirror of
https://fastgit.cc/github.com/Yeachan-Heo/oh-my-claudecode
synced 2026-04-20 12:51:30 +08:00
Keep agent guidance and benchmark defaults aligned with OMC
The root AGENTS contract had drifted toward OMX/Codex wording and state paths, which made the project-level guidance inconsistent with the actual OMC runtime. The benchmark suite also carried split default model strings across shell wrappers, the Python runner, and results docs, so this cleanup re-aligned the suite on one current Sonnet 4.6 default and added a narrow contract test to catch future regressions. Constraint: Limit the cleanup to stale OMC-vs-OMX references and benchmark model strings Rejected: Regenerate broader docs/templates wholesale | unnecessary scope for a targeted cleanup issue Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep root AGENTS branding/state paths in sync with OMC runtime contracts and update benchmark defaults in one place when the benchmark model changes Tested: ./node_modules/.bin/vitest run src/__tests__/tier0-docs-consistency.test.ts src/__tests__/hooks.test.ts src/config/__tests__/loader.test.ts Tested: python3 -m py_compile benchmark/run_benchmark.py Tested: bash -n benchmark/quick_test.sh benchmark/run_vanilla.sh benchmark/run_omc.sh benchmark/run_full_comparison.sh Not-tested: Full benchmark execution against live Anthropic/SWE-bench infrastructure
This commit is contained in:
40
AGENTS.md
40
AGENTS.md
@@ -1,6 +1,6 @@
|
||||
# oh-my-codex - Intelligent Multi-Agent Orchestration
|
||||
# oh-my-claudecode - Intelligent Multi-Agent Orchestration
|
||||
|
||||
You are running with oh-my-codex (OMX), a multi-agent orchestration layer for Codex CLI.
|
||||
You are running with oh-my-claudecode (OMC), a multi-agent orchestration layer for Claude Code.
|
||||
Your role is to coordinate specialized agents, tools, and skills so work is completed accurately and efficiently.
|
||||
|
||||
<guidance_schema_contract>
|
||||
@@ -60,7 +60,7 @@ For non-trivial SDK/API/framework usage, delegate to `dependency-expert` to chec
|
||||
</delegation_rules>
|
||||
|
||||
<child_agent_protocol>
|
||||
Codex CLI spawns child agents via the `spawn_agent` tool (requires `multi_agent = true`).
|
||||
Claude Code spawns child agents via the `spawn_agent` tool (requires `multi_agent = true`).
|
||||
To inject role-specific behavior, the parent MUST read the role prompt and pass it in the spawned agent message.
|
||||
|
||||
Delegation steps:
|
||||
@@ -90,7 +90,7 @@ Key constraints:
|
||||
</child_agent_protocol>
|
||||
|
||||
<invocation_conventions>
|
||||
Codex CLI uses these prefixes for custom commands:
|
||||
Claude Code uses these prefixes for custom commands:
|
||||
- `/prompts:name` — invoke a custom prompt (e.g., `/prompts:architect "review auth module"`)
|
||||
- `$name` — invoke a skill (e.g., `$ralph "fix all tests"`, `$autopilot "build REST API"`)
|
||||
- `/skills` — browse available skills interactively
|
||||
@@ -113,7 +113,7 @@ For workflow skills: `$name` (e.g., `$ralph "fix all tests"`)
|
||||
---
|
||||
|
||||
<agent_catalog>
|
||||
Use `/prompts:name` to invoke specialized agents (Codex CLI custom prompt syntax).
|
||||
Use `/prompts:name` to invoke specialized agents (Claude Code custom prompt syntax).
|
||||
|
||||
Build/Analysis Lane:
|
||||
- `/prompts:explore`: Fast codebase search, file/symbol mapping
|
||||
@@ -181,7 +181,7 @@ Detection rules:
|
||||
|
||||
Ralph / Ralplan execution gate:
|
||||
- Enforce **ralplan-first** when ralph is active and planning is not complete.
|
||||
- Planning is complete only after both `.omx/plans/prd-*.md` and `.omx/plans/test-spec-*.md` exist.
|
||||
- Planning is complete only after both `.omc/plans/prd-*.md` and `.omc/plans/test-spec-*.md` exist.
|
||||
- Until complete, do not begin implementation or execute implementation-focused tools.
|
||||
</keyword_detection>
|
||||
|
||||
@@ -270,10 +270,12 @@ Resume: detect existing team state and resume from the last incomplete stage.
|
||||
<team_model_resolution>
|
||||
Team/Swarm worker startup currently uses one shared `agentType` and one shared launch-arg set for all workers in a team run.
|
||||
|
||||
For worker model selection, apply this precedence (highest to lowest):
|
||||
1. Explicit model already present in `OMX_TEAM_WORKER_LAUNCH_ARGS`
|
||||
2. Inherited leader `--model` (when inheritance is enabled)
|
||||
3. Injected low-complexity default model: `gpt-5.3-codex-spark` (only when 1+2 are absent and team `agentType` is low-complexity)
|
||||
For Claude worker model selection, apply this precedence (highest to lowest):
|
||||
1. Explicit `--model` already present in worker launch args
|
||||
2. Direct provider model env (`ANTHROPIC_MODEL` / `CLAUDE_MODEL`)
|
||||
3. Provider tier envs (`CLAUDE_CODE_BEDROCK_SONNET_MODEL`, `ANTHROPIC_DEFAULT_SONNET_MODEL`)
|
||||
4. OMC tier env (`OMC_MODEL_MEDIUM`)
|
||||
5. Otherwise let Claude Code use its default model
|
||||
|
||||
Model flag normalization contract:
|
||||
- Accept both `--model <value>` and `--model=<value>`
|
||||
@@ -315,7 +317,7 @@ Anti-slop workflow:
|
||||
|
||||
Visual iteration gate:
|
||||
- For visual tasks (reference image(s) + generated screenshot), run `$visual-verdict` every iteration before the next edit.
|
||||
- Persist visual verdict JSON in `.omx/state/{scope}/ralph-progress.json` with both numeric (`score`, threshold pass/fail) and qualitative (`reasoning`, `differences`, `suggestions`, `next_actions`) feedback.
|
||||
- Persist visual verdict JSON in `.omc/state/{scope}/ralph-progress.json` with both numeric (`score`, threshold pass/fail) and qualitative (`reasoning`, `differences`, `suggestions`, `next_actions`) feedback.
|
||||
|
||||
Continuation:
|
||||
Before concluding, confirm: zero pending tasks, all features working, tests passing, zero errors, verification evidence collected. If any item is unchecked, continue working.
|
||||
@@ -340,14 +342,14 @@ When not to cancel:
|
||||
---
|
||||
|
||||
<state_management>
|
||||
oh-my-codex uses the `.omx/` directory for persistent state:
|
||||
- `.omx/state/` -- Mode state files (JSON)
|
||||
- `.omx/notepad.md` -- Session-persistent notes
|
||||
- `.omx/project-memory.json` -- Cross-session project knowledge
|
||||
- `.omx/plans/` -- Planning documents
|
||||
- `.omx/logs/` -- Audit logs
|
||||
oh-my-claudecode uses the `.omc/` directory for persistent state:
|
||||
- `.omc/state/` -- Mode state files (JSON)
|
||||
- `.omc/notepad.md` -- Session-persistent notes
|
||||
- `.omc/project-memory.json` -- Cross-session project knowledge
|
||||
- `.omc/plans/` -- Planning documents
|
||||
- `.omc/logs/` -- Audit logs
|
||||
|
||||
Tools are available via MCP when configured (`omx setup` registers all servers):
|
||||
Tools are available via MCP when configured (`omc setup` registers all servers):
|
||||
|
||||
State & Memory:
|
||||
- `state_read`, `state_write`, `state_clear`, `state_list_active`, `state_get_status`
|
||||
@@ -388,7 +390,7 @@ Recommended mode fields:
|
||||
|
||||
## Setup
|
||||
|
||||
Run `omx setup` to install all components. Run `omx doctor` to verify installation.
|
||||
Run `omc setup` to install all components. Run `omc doctor` to verify installation.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -63,7 +63,7 @@ Run vanilla Claude Code benchmark:
|
||||
**Options:**
|
||||
- `--limit N` - Limit to N instances (default: all)
|
||||
- `--skip N` - Skip first N instances (default: 0)
|
||||
- `--model MODEL` - Claude model to use (default: claude-sonnet-4.5-20250929)
|
||||
- `--model MODEL` - Claude model to use (default: claude-sonnet-4-6-20260217)
|
||||
- `--timeout SECS` - Timeout per instance (default: 300)
|
||||
|
||||
**Examples:**
|
||||
|
||||
@@ -36,7 +36,7 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
# Quick test configuration
|
||||
TEST_LIMIT=5
|
||||
MODEL="claude-sonnet-4.5-20250929"
|
||||
MODEL="claude-sonnet-4-6-20260217"
|
||||
TIMEOUT="180" # 3 minutes per instance for quick test
|
||||
|
||||
# Parse arguments
|
||||
@@ -61,7 +61,7 @@ while [[ $# -gt 0 ]]; do
|
||||
echo ""
|
||||
echo "Options:"
|
||||
echo " --limit N Number of instances to test (default: 5)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4.5-20250929)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4-6-20260217)"
|
||||
echo " --timeout SECS Timeout per instance (default: 180)"
|
||||
echo " -h, --help Show this help message"
|
||||
exit 0
|
||||
|
||||
@@ -19,7 +19,7 @@
|
||||
|
||||
### Evaluation Setup
|
||||
|
||||
- **Model:** Claude Sonnet 4 (claude-sonnet-4-20250514)
|
||||
- **Model:** Claude Sonnet 4.6 (claude-sonnet-4-6-20260217)
|
||||
- **Max Tokens:** 16,384 output tokens per instance
|
||||
- **Timeout:** 30 minutes per instance
|
||||
- **Workers:** 4 parallel evaluations
|
||||
|
||||
@@ -58,7 +58,7 @@ class BenchmarkConfig:
|
||||
limit: Optional[int] = None
|
||||
retries: int = 3
|
||||
retry_delay: int = 30
|
||||
model: str = "claude-sonnet-4-20250514"
|
||||
model: str = "claude-sonnet-4-6-20260217"
|
||||
skip: int = 0
|
||||
|
||||
|
||||
@@ -581,8 +581,8 @@ Examples:
|
||||
)
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
default="claude-sonnet-4-20250514",
|
||||
help="Claude model to use (default: claude-sonnet-4-20250514)",
|
||||
default="claude-sonnet-4-6-20260217",
|
||||
help="Claude model to use (default: claude-sonnet-4-6-20260217)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip",
|
||||
|
||||
@@ -37,7 +37,7 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# Parse arguments
|
||||
LIMIT=""
|
||||
SKIP=""
|
||||
MODEL="claude-sonnet-4.5-20250929"
|
||||
MODEL="claude-sonnet-4-6-20260217"
|
||||
TIMEOUT="300"
|
||||
SKIP_VANILLA=false
|
||||
SKIP_OMC=false
|
||||
@@ -81,7 +81,7 @@ while [[ $# -gt 0 ]]; do
|
||||
echo "Options:"
|
||||
echo " --limit N Limit to N instances (default: all)"
|
||||
echo " --skip N Skip first N instances (default: 0)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4.5-20250929)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4-6-20260217)"
|
||||
echo " --timeout SECS Timeout per instance (default: 300)"
|
||||
echo " --skip-vanilla Skip vanilla benchmark run"
|
||||
echo " --skip-omc Skip OMC benchmark run"
|
||||
|
||||
@@ -38,7 +38,7 @@ LOG_FILE="$LOGS_DIR/${RUN_MODE}_${TIMESTAMP}.log"
|
||||
# Parse arguments
|
||||
LIMIT=""
|
||||
SKIP=""
|
||||
MODEL="claude-sonnet-4.5-20250929"
|
||||
MODEL="claude-sonnet-4-6-20260217"
|
||||
TIMEOUT="300"
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
@@ -65,7 +65,7 @@ while [[ $# -gt 0 ]]; do
|
||||
echo "Options:"
|
||||
echo " --limit N Limit to N instances (default: all)"
|
||||
echo " --skip N Skip first N instances (default: 0)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4.5-20250929)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4-6-20260217)"
|
||||
echo " --timeout SECS Timeout per instance (default: 300)"
|
||||
echo " -h, --help Show this help message"
|
||||
exit 0
|
||||
|
||||
@@ -38,7 +38,7 @@ LOG_FILE="$LOGS_DIR/${RUN_MODE}_${TIMESTAMP}.log"
|
||||
# Parse arguments
|
||||
LIMIT=""
|
||||
SKIP=""
|
||||
MODEL="claude-sonnet-4.5-20250929"
|
||||
MODEL="claude-sonnet-4-6-20260217"
|
||||
TIMEOUT="300"
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
@@ -65,7 +65,7 @@ while [[ $# -gt 0 ]]; do
|
||||
echo "Options:"
|
||||
echo " --limit N Limit to N instances (default: all)"
|
||||
echo " --skip N Skip first N instances (default: 0)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4.5-20250929)"
|
||||
echo " --model MODEL Claude model to use (default: claude-sonnet-4-6-20260217)"
|
||||
echo " --timeout SECS Timeout per instance (default: 300)"
|
||||
echo " -h, --help Show this help message"
|
||||
exit 0
|
||||
|
||||
@@ -577,7 +577,7 @@ describe('Team staged workflow integration', () => {
|
||||
});
|
||||
|
||||
it('compacts OMC-style root AGENTS guidance on session-start without dropping key sections', async () => {
|
||||
const agentsContent = `# oh-my-codex - Intelligent Multi-Agent Orchestration
|
||||
const agentsContent = `# oh-my-claudecode - Intelligent Multi-Agent Orchestration
|
||||
|
||||
<guidance_schema_contract>
|
||||
schema
|
||||
|
||||
@@ -62,4 +62,36 @@ describe('Tier-0 contract docs consistency', () => {
|
||||
expect(claudeDoc).toContain('Team orchestration is explicit via `/team`.');
|
||||
expect(referenceDoc).not.toContain('| `team`, `coordinated team`');
|
||||
});
|
||||
|
||||
|
||||
it('keeps root AGENTS.md aligned with OMC branding and state paths', () => {
|
||||
const agentsDoc = readProjectFile('AGENTS.md');
|
||||
|
||||
expect(agentsDoc).toContain('# oh-my-claudecode - Intelligent Multi-Agent Orchestration');
|
||||
expect(agentsDoc).toContain('You are running with oh-my-claudecode (OMC), a multi-agent orchestration layer for Claude Code.');
|
||||
expect(agentsDoc).toContain('`.omc/state/`');
|
||||
expect(agentsDoc).toContain('Run `omc setup` to install all components. Run `omc doctor` to verify installation.');
|
||||
expect(agentsDoc).not.toContain('oh-my-codex');
|
||||
expect(agentsDoc).not.toContain('OMX_TEAM_WORKER_LAUNCH_ARGS');
|
||||
expect(agentsDoc).not.toContain('gpt-5.3-codex-spark');
|
||||
});
|
||||
|
||||
it('keeps benchmark default model references aligned across docs and scripts', () => {
|
||||
const benchmarkReadme = readProjectFile('benchmark', 'README.md');
|
||||
const benchmarkRunner = readProjectFile('benchmark', 'run_benchmark.py');
|
||||
const quickTest = readProjectFile('benchmark', 'quick_test.sh');
|
||||
const vanilla = readProjectFile('benchmark', 'run_vanilla.sh');
|
||||
const omc = readProjectFile('benchmark', 'run_omc.sh');
|
||||
const fullComparison = readProjectFile('benchmark', 'run_full_comparison.sh');
|
||||
const resultsReadme = readProjectFile('benchmark', 'results', 'README.md');
|
||||
const expectedModel = 'claude-sonnet-4-6-20260217';
|
||||
|
||||
for (const content of [benchmarkReadme, benchmarkRunner, quickTest, vanilla, omc, fullComparison, resultsReadme]) {
|
||||
expect(content).toContain(expectedModel);
|
||||
}
|
||||
|
||||
expect(benchmarkReadme).not.toContain('claude-sonnet-4.5-20250929');
|
||||
expect(benchmarkRunner).not.toContain('claude-sonnet-4-20250514');
|
||||
expect(resultsReadme).toContain('Claude Sonnet 4.6');
|
||||
});
|
||||
});
|
||||
|
||||
@@ -141,7 +141,7 @@ describe("startup context compaction", () => {
|
||||
|
||||
try {
|
||||
const omcAgentsPath = join(tempDir, "AGENTS.md");
|
||||
const omcGuidance = `# oh-my-codex - Intelligent Multi-Agent Orchestration
|
||||
const omcGuidance = `# oh-my-claudecode - Intelligent Multi-Agent Orchestration
|
||||
|
||||
<guidance_schema_contract>
|
||||
schema
|
||||
|
||||
Reference in New Issue
Block a user