Files
oh-my-claudecode/research/hephaestus-vs-deep-executor-comparison.md

12 KiB

Hephaestus vs Deep-Executor: Comparative Analysis

Analysis Summary

  • Research Question: How do the Hephaestus (oh-my-opencode) and Deep-Executor (oh-my-claudecode) agent architectures differ, and what can each learn from the other?
  • Methodology: Structured feature comparison across 14 capability dimensions, scored 0-3

1. Architectural Overview

Dimension Hephaestus Deep-Executor
Core Philosophy Conductor/Delegator Self-Contained Forge
Execution Model Multi-agent parallel Single-agent sequential
Agent Spawning 2-5 parallel background agents BLOCKED (by design)
Tool Strategy Agents as tools Direct MCP/LSP tools
Model GPT 5.2 with reasoning levels Claude (Opus/Sonnet)

Key Insight

These are fundamentally different architectural paradigms. Hephaestus is a distributed system -- it treats agents as microservices. Deep-Executor is a monolith -- it concentrates all capability in one process. Neither is inherently superior; they optimize for different constraints.


2. Feature Gap Analysis: What Hephaestus Has That Deep-Executor Lacks

Feature Comparison Matrix

Category                                 Hephaestus    Deep-Exec    Delta
--------------------------------------------------------------------------------
Parallel Exploration                              3            0       +3
Delegation to Specialists                         3            0       +3
External Research (Docs/OSS)                      3            0       +3
Failure Recovery / Escalation                     3            1       +2
Dynamic Prompt Adaptation                         3            0       +3
Reasoning Level Configuration                     3            0       +3
TODO / Task Tracking Discipline                   1            3       -2
Verification Protocol Rigor                       1            3       -2
Structured Output Contract                        2            3       -1
MCP/LSP Tool Strategy                             1            3       -2
Ambiguity Resolution                              3            2       +1
Session Continuity                                3            2       +1
Token Efficiency                                  1            3       -2
Self-Sufficiency                                  1            3       -2
--------------------------------------------------------------------------------
TOTAL                                            31           23       +8

2.1 Parallel Exploration (Gap: 3/3)

Hephaestus: Fires 2-5 explore/document-specialist agents simultaneously as background tasks. Continues working while results stream in. Uses background_output(task_id) to collect.

Deep-Executor: Sequential exploration only. Must complete each Glob/Grep/Read call before starting the next.

Impact: For large codebases, Hephaestus can gather context 3-5x faster. Deep-Executor compensates with more targeted, cheaper queries but loses wall-clock time on broad searches.

2.2 Delegation to Specialists (Gap: 3/3)

Hephaestus: Three specialized agent types:

  • Explore agents: Parallel codebase search
  • Document-Specialist: External docs, GitHub, OSS research
  • Architect: High-IQ consulting for stuck situations

Deep-Executor: No delegation. All work is self-performed. This is a deliberate design choice ("You are the forge") but means no access to specialist capabilities.

Impact: Hephaestus can handle broader task scopes. Deep-Executor is limited to what a single agent context window can reason about.

2.3 External Research Capability (Gap: 3/3)

Hephaestus: Document-Specialist agent fetches external documentation, GitHub repos, and OSS references. This provides real-time knowledge augmentation.

Deep-Executor: No external research capability. Relies entirely on pre-loaded context and available tools.

Impact: When working with unfamiliar APIs or libraries, Hephaestus has a significant advantage.

2.4 Failure Recovery / Escalation (Gap: 2/3)

Hephaestus: Structured 3-failure protocol: STOP -> REVERT -> DOCUMENT -> CONSULT Architect. Clear escalation path prevents infinite retry loops.

Deep-Executor: No explicit failure threshold or escalation. Has verification loops but no "give up and escalate" mechanism.

Impact: Hephaestus avoids wasting tokens on unrecoverable situations. Deep-Executor can get stuck in retry loops.

2.5 Dynamic Prompt Adaptation (Gap: 3/3)

Hephaestus: Uses helper functions (buildExploreSection(), etc.) to dynamically construct prompts based on available capabilities. Prompt adapts to runtime environment.

Deep-Executor: Static prompt. Same instructions regardless of available tools or context.

Impact: Hephaestus is more portable across environments with varying tool availability.

2.6 Reasoning Level Configuration (Gap: 3/3)

Hephaestus: Explicit reasoning budget per task type (MEDIUM for code changes, HIGH for complex refactoring). "ROUTER NUDGE" directs model thinking depth.

Deep-Executor: No reasoning level control. Same approach for all task complexities.

Impact: Hephaestus can optimize cost/quality tradeoff per subtask.


3. Inverse Gaps: What Deep-Executor Has That Hephaestus Could Benefit From

3.1 TODO Discipline (Gap: 2/3)

Deep-Executor: NON-NEGOTIABLE rules: TodoWrite for 2+ steps, ONE in_progress at a time, mark completed IMMEDIATELY. This creates a reliable audit trail and prevents task drift.

Hephaestus: Minimal task tracking. Relies on delegation structure rather than explicit progress tracking.

Recommendation for Hephaestus: Adopt mandatory task tracking for complex multi-step operations.

3.2 Verification Protocol Rigor (Gap: 2/3)

Deep-Executor: After EVERY change: lsp_diagnostics. Before completion: ALL of (todos, tests, build, diagnostics). Specified evidence format.

Hephaestus: No structured verification protocol. Delegates verification implicitly through agent results.

Recommendation for Hephaestus: Add post-change diagnostic checks and a completion checklist.

3.3 MCP/LSP Tool Strategy (Gap: 2/3)

Deep-Executor: Explicit strategy for lsp_diagnostics (single file), lsp_diagnostics_directory (project-wide), ast_grep_search/replace with dryRun protocol. Clear escalation from file to project scope.

Hephaestus: No explicit LSP/AST tool strategy documented.

Recommendation for Hephaestus: Document and enforce a tool selection hierarchy.

3.4 Token Efficiency (Gap: 2/3)

Deep-Executor: Single agent = single context window. No inter-agent communication overhead. No prompt duplication across spawned agents.

Hephaestus: Each spawned agent carries its own system prompt + context. 2-5 parallel agents means 2-5x prompt overhead. Background task management adds coordination tokens.

Estimated overhead: Hephaestus uses ~2-4x more tokens per exploration phase due to agent spawning costs.

3.5 Self-Sufficiency (Gap: 2/3)

Deep-Executor: Works in any environment. No dependency on agent infrastructure, background task systems, or multi-agent coordination. Degrades gracefully.

Hephaestus: Depends on delegation infrastructure. If agent spawning fails, core workflow breaks.


4. Token Efficiency Analysis

Operation Hephaestus (est. tokens) Deep-Executor (est. tokens) Ratio
System prompt per agent ~3,000 ~3,000 (once) 1:1
3 parallel explore agents ~9,000 prompt + ~6,000 output ~2,000 (sequential Grep/Glob) 7.5:1
Document-Specialist research call ~4,000 prompt + ~2,000 output N/A (not available) -
Architect consultation ~5,000 prompt + ~3,000 output N/A (not available) -
Coordination overhead ~1,000 per delegation 0 -
Typical task total ~30,000-50,000 ~10,000-20,000 ~2.5:1

Conclusion: Deep-Executor is approximately 2-3x more token-efficient for equivalent tasks. Hephaestus trades tokens for wall-clock speed and broader capability.


5. Architectural Tradeoffs

Delegation Model (Hephaestus)

Strengths:

  • Parallel execution reduces wall-clock time
  • Specialist agents can be individually optimized
  • External research augments knowledge
  • Failure escalation prevents waste

Weaknesses:

  • Higher token cost (2-3x)
  • Coordination complexity
  • Context fragmentation across agents
  • Infrastructure dependency

Self-Contained Model (Deep-Executor)

Strengths:

  • Token efficient
  • No coordination overhead
  • Unified context (no information loss between agents)
  • Portable and infrastructure-independent
  • Strong verification discipline

Weaknesses:

  • Sequential exploration (slower wall-clock)
  • No escalation path when stuck
  • No external research
  • Cannot parallelize independent subtasks
  • Single point of failure (one agent context limit)

6. Prioritized Improvement Recommendations for Deep-Executor

Priority 1: Failure Recovery Protocol (HIGH IMPACT, LOW EFFORT)

Add a structured failure threshold:

After 3 consecutive failures on same task:
1. STOP current approach
2. DOCUMENT what was tried and why it failed
3. Try fundamentally different approach
4. If still failing: report to orchestrator with evidence

This requires NO delegation infrastructure -- just self-discipline rules.

Priority 2: Exploration Batching (HIGH IMPACT, MEDIUM EFFORT)

While true parallel agents are blocked, Deep-Executor can batch exploration:

- Issue multiple Glob/Grep calls in a single turn (already possible)
- Structure 5 exploration questions upfront (already present)
- Add explicit "exploration budget" (max N tool calls before proceeding)

Ensure the agent always issues independent Glob/Grep/Read calls in parallel within a single response.

Priority 3: Reasoning Depth Hints (MEDIUM IMPACT, LOW EFFORT)

Add task-complexity classification to control thoroughness:

SIMPLE (< 1 file, < 20 lines): Quick fix, minimal exploration
MEDIUM (1-3 files, < 100 lines): Standard exploration + verification
COMPLEX (3+ files, architectural): Full exploration + multiple verification passes

Priority 4: Dynamic Tool Adaptation (MEDIUM IMPACT, MEDIUM EFFORT)

Add capability detection:

IF lsp_diagnostics available: use for verification
ELSE IF build command known: use build output
ELSE: rely on ast_grep_search for structural validation

Priority 5: Structured Escalation Reporting (LOW IMPACT, LOW EFFORT)

When stuck, produce a structured failure report:

## Escalation Report
- **Task**: What was attempted
- **Attempts**: What approaches were tried (with outcomes)
- **Blocker**: Why it cannot be resolved
- **Suggested Next Steps**: What a human or orchestrator should try

7. Implementation Suggestions

For Deep-Executor Enhancements

Enhancement Implementation Effort
Failure threshold Add counter + rules to prompt 1 hour
Exploration batching Add parallel tool call guidance 30 min
Complexity classification Add task sizing heuristic 1 hour
Escalation report format Add output template 30 min
Tool capability detection Add conditional tool sections 2 hours

For Hephaestus Enhancements (Inverse)

Enhancement Implementation Effort
TODO discipline Port Deep-Executor's TodoWrite rules 1 hour
Verification protocol Add post-change lsp_diagnostics mandate 1 hour
LSP tool strategy Document tool selection hierarchy 2 hours
Completion checklist Port Definition of Done format 30 min

8. Conclusion

Hephaestus and Deep-Executor represent two valid points on the agent architecture spectrum:

  • Hephaestus optimizes for capability breadth and speed at the cost of token efficiency
  • Deep-Executor optimizes for reliability and efficiency at the cost of parallelism

The most impactful improvements for Deep-Executor are those that require NO architectural changes: failure recovery protocols, exploration batching, and complexity-aware reasoning. These can be implemented purely through prompt engineering within the existing self-contained model.

The most impactful improvements for Hephaestus are Deep-Executor's discipline mechanisms: TODO tracking, verification protocols, and structured completion contracts. These add reliability without sacrificing Hephaestus's delegation strengths.


Analysis completed: 2026-02-01 Session: hephaestus-deep-executor-comparison