12 KiB
Hephaestus vs Deep-Executor: Comparative Analysis
Analysis Summary
- Research Question: How do the Hephaestus (oh-my-opencode) and Deep-Executor (oh-my-claudecode) agent architectures differ, and what can each learn from the other?
- Methodology: Structured feature comparison across 14 capability dimensions, scored 0-3
1. Architectural Overview
| Dimension | Hephaestus | Deep-Executor |
|---|---|---|
| Core Philosophy | Conductor/Delegator | Self-Contained Forge |
| Execution Model | Multi-agent parallel | Single-agent sequential |
| Agent Spawning | 2-5 parallel background agents | BLOCKED (by design) |
| Tool Strategy | Agents as tools | Direct MCP/LSP tools |
| Model | GPT 5.2 with reasoning levels | Claude (Opus/Sonnet) |
Key Insight
These are fundamentally different architectural paradigms. Hephaestus is a distributed system -- it treats agents as microservices. Deep-Executor is a monolith -- it concentrates all capability in one process. Neither is inherently superior; they optimize for different constraints.
2. Feature Gap Analysis: What Hephaestus Has That Deep-Executor Lacks
Feature Comparison Matrix
Category Hephaestus Deep-Exec Delta
--------------------------------------------------------------------------------
Parallel Exploration 3 0 +3
Delegation to Specialists 3 0 +3
External Research (Docs/OSS) 3 0 +3
Failure Recovery / Escalation 3 1 +2
Dynamic Prompt Adaptation 3 0 +3
Reasoning Level Configuration 3 0 +3
TODO / Task Tracking Discipline 1 3 -2
Verification Protocol Rigor 1 3 -2
Structured Output Contract 2 3 -1
MCP/LSP Tool Strategy 1 3 -2
Ambiguity Resolution 3 2 +1
Session Continuity 3 2 +1
Token Efficiency 1 3 -2
Self-Sufficiency 1 3 -2
--------------------------------------------------------------------------------
TOTAL 31 23 +8
2.1 Parallel Exploration (Gap: 3/3)
Hephaestus: Fires 2-5 explore/document-specialist agents simultaneously as background tasks. Continues working while results stream in. Uses background_output(task_id) to collect.
Deep-Executor: Sequential exploration only. Must complete each Glob/Grep/Read call before starting the next.
Impact: For large codebases, Hephaestus can gather context 3-5x faster. Deep-Executor compensates with more targeted, cheaper queries but loses wall-clock time on broad searches.
2.2 Delegation to Specialists (Gap: 3/3)
Hephaestus: Three specialized agent types:
- Explore agents: Parallel codebase search
- Document-Specialist: External docs, GitHub, OSS research
- Architect: High-IQ consulting for stuck situations
Deep-Executor: No delegation. All work is self-performed. This is a deliberate design choice ("You are the forge") but means no access to specialist capabilities.
Impact: Hephaestus can handle broader task scopes. Deep-Executor is limited to what a single agent context window can reason about.
2.3 External Research Capability (Gap: 3/3)
Hephaestus: Document-Specialist agent fetches external documentation, GitHub repos, and OSS references. This provides real-time knowledge augmentation.
Deep-Executor: No external research capability. Relies entirely on pre-loaded context and available tools.
Impact: When working with unfamiliar APIs or libraries, Hephaestus has a significant advantage.
2.4 Failure Recovery / Escalation (Gap: 2/3)
Hephaestus: Structured 3-failure protocol: STOP -> REVERT -> DOCUMENT -> CONSULT Architect. Clear escalation path prevents infinite retry loops.
Deep-Executor: No explicit failure threshold or escalation. Has verification loops but no "give up and escalate" mechanism.
Impact: Hephaestus avoids wasting tokens on unrecoverable situations. Deep-Executor can get stuck in retry loops.
2.5 Dynamic Prompt Adaptation (Gap: 3/3)
Hephaestus: Uses helper functions (buildExploreSection(), etc.) to dynamically construct prompts based on available capabilities. Prompt adapts to runtime environment.
Deep-Executor: Static prompt. Same instructions regardless of available tools or context.
Impact: Hephaestus is more portable across environments with varying tool availability.
2.6 Reasoning Level Configuration (Gap: 3/3)
Hephaestus: Explicit reasoning budget per task type (MEDIUM for code changes, HIGH for complex refactoring). "ROUTER NUDGE" directs model thinking depth.
Deep-Executor: No reasoning level control. Same approach for all task complexities.
Impact: Hephaestus can optimize cost/quality tradeoff per subtask.
3. Inverse Gaps: What Deep-Executor Has That Hephaestus Could Benefit From
3.1 TODO Discipline (Gap: 2/3)
Deep-Executor: NON-NEGOTIABLE rules: TodoWrite for 2+ steps, ONE in_progress at a time, mark completed IMMEDIATELY. This creates a reliable audit trail and prevents task drift.
Hephaestus: Minimal task tracking. Relies on delegation structure rather than explicit progress tracking.
Recommendation for Hephaestus: Adopt mandatory task tracking for complex multi-step operations.
3.2 Verification Protocol Rigor (Gap: 2/3)
Deep-Executor: After EVERY change: lsp_diagnostics. Before completion: ALL of (todos, tests, build, diagnostics). Specified evidence format.
Hephaestus: No structured verification protocol. Delegates verification implicitly through agent results.
Recommendation for Hephaestus: Add post-change diagnostic checks and a completion checklist.
3.3 MCP/LSP Tool Strategy (Gap: 2/3)
Deep-Executor: Explicit strategy for lsp_diagnostics (single file), lsp_diagnostics_directory (project-wide), ast_grep_search/replace with dryRun protocol. Clear escalation from file to project scope.
Hephaestus: No explicit LSP/AST tool strategy documented.
Recommendation for Hephaestus: Document and enforce a tool selection hierarchy.
3.4 Token Efficiency (Gap: 2/3)
Deep-Executor: Single agent = single context window. No inter-agent communication overhead. No prompt duplication across spawned agents.
Hephaestus: Each spawned agent carries its own system prompt + context. 2-5 parallel agents means 2-5x prompt overhead. Background task management adds coordination tokens.
Estimated overhead: Hephaestus uses ~2-4x more tokens per exploration phase due to agent spawning costs.
3.5 Self-Sufficiency (Gap: 2/3)
Deep-Executor: Works in any environment. No dependency on agent infrastructure, background task systems, or multi-agent coordination. Degrades gracefully.
Hephaestus: Depends on delegation infrastructure. If agent spawning fails, core workflow breaks.
4. Token Efficiency Analysis
| Operation | Hephaestus (est. tokens) | Deep-Executor (est. tokens) | Ratio |
|---|---|---|---|
| System prompt per agent | ~3,000 | ~3,000 (once) | 1:1 |
| 3 parallel explore agents | ~9,000 prompt + ~6,000 output | ~2,000 (sequential Grep/Glob) | 7.5:1 |
| Document-Specialist research call | ~4,000 prompt + ~2,000 output | N/A (not available) | - |
| Architect consultation | ~5,000 prompt + ~3,000 output | N/A (not available) | - |
| Coordination overhead | ~1,000 per delegation | 0 | - |
| Typical task total | ~30,000-50,000 | ~10,000-20,000 | ~2.5:1 |
Conclusion: Deep-Executor is approximately 2-3x more token-efficient for equivalent tasks. Hephaestus trades tokens for wall-clock speed and broader capability.
5. Architectural Tradeoffs
Delegation Model (Hephaestus)
Strengths:
- Parallel execution reduces wall-clock time
- Specialist agents can be individually optimized
- External research augments knowledge
- Failure escalation prevents waste
Weaknesses:
- Higher token cost (2-3x)
- Coordination complexity
- Context fragmentation across agents
- Infrastructure dependency
Self-Contained Model (Deep-Executor)
Strengths:
- Token efficient
- No coordination overhead
- Unified context (no information loss between agents)
- Portable and infrastructure-independent
- Strong verification discipline
Weaknesses:
- Sequential exploration (slower wall-clock)
- No escalation path when stuck
- No external research
- Cannot parallelize independent subtasks
- Single point of failure (one agent context limit)
6. Prioritized Improvement Recommendations for Deep-Executor
Priority 1: Failure Recovery Protocol (HIGH IMPACT, LOW EFFORT)
Add a structured failure threshold:
After 3 consecutive failures on same task:
1. STOP current approach
2. DOCUMENT what was tried and why it failed
3. Try fundamentally different approach
4. If still failing: report to orchestrator with evidence
This requires NO delegation infrastructure -- just self-discipline rules.
Priority 2: Exploration Batching (HIGH IMPACT, MEDIUM EFFORT)
While true parallel agents are blocked, Deep-Executor can batch exploration:
- Issue multiple Glob/Grep calls in a single turn (already possible)
- Structure 5 exploration questions upfront (already present)
- Add explicit "exploration budget" (max N tool calls before proceeding)
Ensure the agent always issues independent Glob/Grep/Read calls in parallel within a single response.
Priority 3: Reasoning Depth Hints (MEDIUM IMPACT, LOW EFFORT)
Add task-complexity classification to control thoroughness:
SIMPLE (< 1 file, < 20 lines): Quick fix, minimal exploration
MEDIUM (1-3 files, < 100 lines): Standard exploration + verification
COMPLEX (3+ files, architectural): Full exploration + multiple verification passes
Priority 4: Dynamic Tool Adaptation (MEDIUM IMPACT, MEDIUM EFFORT)
Add capability detection:
IF lsp_diagnostics available: use for verification
ELSE IF build command known: use build output
ELSE: rely on ast_grep_search for structural validation
Priority 5: Structured Escalation Reporting (LOW IMPACT, LOW EFFORT)
When stuck, produce a structured failure report:
## Escalation Report
- **Task**: What was attempted
- **Attempts**: What approaches were tried (with outcomes)
- **Blocker**: Why it cannot be resolved
- **Suggested Next Steps**: What a human or orchestrator should try
7. Implementation Suggestions
For Deep-Executor Enhancements
| Enhancement | Implementation | Effort |
|---|---|---|
| Failure threshold | Add counter + rules to prompt | 1 hour |
| Exploration batching | Add parallel tool call guidance | 30 min |
| Complexity classification | Add task sizing heuristic | 1 hour |
| Escalation report format | Add output template | 30 min |
| Tool capability detection | Add conditional tool sections | 2 hours |
For Hephaestus Enhancements (Inverse)
| Enhancement | Implementation | Effort |
|---|---|---|
| TODO discipline | Port Deep-Executor's TodoWrite rules | 1 hour |
| Verification protocol | Add post-change lsp_diagnostics mandate | 1 hour |
| LSP tool strategy | Document tool selection hierarchy | 2 hours |
| Completion checklist | Port Definition of Done format | 30 min |
8. Conclusion
Hephaestus and Deep-Executor represent two valid points on the agent architecture spectrum:
- Hephaestus optimizes for capability breadth and speed at the cost of token efficiency
- Deep-Executor optimizes for reliability and efficiency at the cost of parallelism
The most impactful improvements for Deep-Executor are those that require NO architectural changes: failure recovery protocols, exploration batching, and complexity-aware reasoning. These can be implemented purely through prompt engineering within the existing self-contained model.
The most impactful improvements for Hephaestus are Deep-Executor's discipline mechanisms: TODO tracking, verification protocols, and structured completion contracts. These add reliability without sacrificing Hephaestus's delegation strengths.
Analysis completed: 2026-02-01 Session: hephaestus-deep-executor-comparison