Merge pull request #2724 from Yeachan-Heo/fix/issue-2723-deep-interview-threshold

Fix deep-interview threshold on native skill path
2026-04-20 21:00:50 +08:00 · 2026-04-19 01:55:35 +09:00
parent 46da35e719 b3b4dd964a
commit 68d304a9d7
3 changed files with 54 additions and 25 deletions
--- a/skills/deep-interview/SKILL.md
+++ b/skills/deep-interview/SKILL.md
@@ -10,7 +10,7 @@ level: 3
 ---

 <Purpose>
-Deep Interview implements Ouroboros-inspired Socratic questioning with mathematical ambiguity scoring. It replaces vague ideas with crystal-clear specifications by asking targeted questions that expose hidden assumptions, measuring clarity across weighted dimensions, and refusing to proceed until ambiguity drops below a configurable threshold (default: 20%). The output feeds into a 3-stage pipeline: **deep-interview → ralplan (consensus refinement) → autopilot (execution)**, ensuring maximum clarity at every stage.
+Deep Interview implements Ouroboros-inspired Socratic questioning with mathematical ambiguity scoring. It replaces vague ideas with crystal-clear specifications by asking targeted questions that expose hidden assumptions, measuring clarity across weighted dimensions, and refusing to proceed until ambiguity drops below the resolved threshold for this run. The output feeds into a 3-stage pipeline: **deep-interview → ralplan (consensus refinement) → autopilot (execution)**, ensuring maximum clarity at every stage.
 </Purpose>

 <Use_When>
@@ -43,7 +43,7 @@ Inspired by the [Ouroboros project](https://github.com/Q00/ouroboros) which demo
 - Gather codebase facts via `explore` agent BEFORE asking the user about them
 - For brownfield confirmation questions, cite the repo evidence that triggered the question (file path, symbol, or pattern) instead of asking the user to rediscover it
 - Score ambiguity after every answer -- display the score transparently
- Do not proceed to execution until ambiguity ≤ threshold (default 0.2)
+- Do not proceed to execution until ambiguity ≤ the resolved threshold for this run
 - Allow early exit with a clear warning if ambiguity is still high
 - Persist interview state for resume across session interruptions
 - Challenge agents activate at specific round thresholds to shift perspective
@@ -70,6 +70,10 @@ When arguments include `--autoresearch`, Deep Interview becomes the zero-learnin
   - If source files exist AND the user's idea references modifying/extending something: **brownfield**
   - Otherwise: **greenfield**
 3. **For brownfield**: Run `explore` agent to map relevant codebase areas, store as `codebase_context`
+3.5. **Load runtime settings**:
+   - Read `[$CLAUDE_CONFIG_DIR|~/.claude]/settings.json` and `./.claude/settings.json` (project overrides user)
+   - Resolve `omc.deepInterview.ambiguityThreshold` into `<resolvedThreshold>`; if it is undefined, use `0.2`
+   - Derive `<resolvedThresholdPercent>` from `<resolvedThreshold>` and substitute both placeholders throughout the remaining instructions before continuing
 4. **Initialize state** via `state_write(mode="deep-interview")`:

 ```json
@@ -82,7 +86,7 @@ When arguments include `--autoresearch`, Deep Interview becomes the zero-learnin
    "initial_idea": "<user input>",
    "rounds": [],
    "current_ambiguity": 1.0,
-    "threshold": 0.2,
+    "threshold": <resolvedThreshold>,
    "codebase_context": null,
    "challenge_modes_used": [],
    "ontology_snapshots": []
@@ -92,7 +96,7 @@ When arguments include `--autoresearch`, Deep Interview becomes the zero-learnin

 5. **Announce the interview** to the user:

-> Starting deep interview. I'll ask targeted questions to understand your idea thoroughly before building anything. After each answer, I'll show your clarity score. We'll proceed to execution once ambiguity drops below 20%.
+> Starting deep interview. I'll ask targeted questions to understand your idea thoroughly before building anything. After each answer, I'll show your clarity score. We'll proceed to execution once ambiguity drops below <resolvedThresholdPercent>.
 >
 > **Your idea:** "{initial_idea}"
 > **Project type:** {greenfield|brownfield}
@@ -384,7 +388,7 @@ Stage 1: Deep Interview          Stage 2: Ralplan                Stage 3: Autopi
 │ Ambiguity scoring   │───>│ Architect reviews         │───>│ Phase 3: QA cycling  │
 │ Challenge agents    │    │ Critic validates          │    │ Phase 4: Validation  │
 │ Spec crystallization│    │ Loop until consensus      │    │ Phase 5: Cleanup     │
-│ Gate: ≤20% ambiguity│    │ ADR + RALPLAN-DR summary  │    │                      │
+│ Gate: ≤<resolvedThresholdPercent> ambiguity│    │ ADR + RALPLAN-DR summary  │    │                      │
 └─────────────────────┘    └───────────────────────────┘    └──────────────────────┘
 Output: spec.md            Output: consensus-plan.md        Output: working code
 ```
@@ -453,7 +457,7 @@ Why good: Challenges a specific assumption (scale requirement) that could dramat
 Early exit with warning:
 ```
 User: "That's enough, just build it"
-System: "Current ambiguity is 35% (threshold: 20%). Areas still unclear:
+System: "Current ambiguity is 35% (threshold: <resolvedThresholdPercent>). Areas still unclear:
  - Success Criteria: 0.5 (How do we verify the search ranking algorithm works correctly?)
  - Constraints: 0.6 (No performance targets defined yet)

@@ -548,7 +552,7 @@ Optional settings in `.claude/settings.json`:
 {
  "omc": {
    "deepInterview": {
-      "ambiguityThreshold": 0.2,
+      "ambiguityThreshold": <resolvedThreshold>,
      "maxRounds": 20,
      "softWarningRounds": 10,
      "minRoundsBeforeExit": 3,
@@ -583,7 +587,7 @@ The recommended execution path chains three quality gates:

 ```
 /deep-interview "vague idea"
-  → Socratic Q&A until ambiguity ≤ 20%
+  → Socratic Q&A until ambiguity ≤ <resolvedThresholdPercent>
  → Spec written to .omc/specs/deep-interview-{slug}.md
  → User selects "Ralplan → Autopilot"
  → /omc-plan --consensus --direct (spec as input, skip interview)
@@ -641,11 +645,11 @@ Each mode is used exactly once, then normal Socratic questioning resumes. Modes
 | Score Range | Meaning | Action |
 |-------------|---------|--------|
 | 0.0 - 0.1 | Crystal clear | Proceed immediately |
-| 0.1 - 0.2 | Clear enough | Proceed (default threshold) |
-| 0.2 - 0.4 | Some gaps | Continue interviewing |
-| 0.4 - 0.6 | Significant gaps | Focus on weakest dimensions |
-| 0.6 - 0.8 | Very unclear | May need reframing (Ontologist) |
-| 0.8 - 1.0 | Almost nothing known | Early stages, keep going |
+| At or below the resolved threshold | Clear enough | Proceed |
+| Above the resolved threshold with minor gaps | Some gaps | Continue interviewing |
+| Moderate ambiguity | Significant gaps | Focus on weakest dimensions |
+| High ambiguity | Very unclear | May need reframing (Ontologist) |
+| Extreme ambiguity | Almost nothing known | Early stages, keep going |
 </Advanced>

 Task: {{ARGUMENTS}}
--- a/src/tests/skills.test.ts
+++ b/src/tests/skills.test.ts
@@ -1,5 +1,5 @@
 import { describe, it, expect, beforeEach, afterEach } from 'vitest';
-import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from 'fs';
+import { mkdirSync, mkdtempSync, readFileSync, rmSync, writeFileSync } from 'fs';
 import { join } from 'path';
 import { tmpdir } from 'os';
 import { createBuiltinSkills, getBuiltinSkill, listBuiltinSkillNames, clearSkillsCache } from '../features/builtin-skills/skills.js';
@@ -335,7 +335,7 @@ describe('Builtin Skills', () => {
      const skill = getBuiltinSkill('deep-interview');
      expect(skill).toBeDefined();
      expect(skill?.template).toContain('Load runtime settings');
-      expect(skill?.template).toContain('ambiguityThreshold = 0.12');
+      expect(skill?.template).toContain('Resolve `omc.deepInterview.ambiguityThreshold` into `0.12`');
      expect(skill?.template).toContain('"threshold": 0.12,');
      expect(skill?.template).toContain('drops below 12%.');
      expect(skill?.template?.indexOf('Load runtime settings')).toBeLessThan(
@@ -356,7 +356,7 @@ describe('Builtin Skills', () => {
      );

      const first = getBuiltinSkill('deep-interview');
-      expect(first?.template).toContain('ambiguityThreshold = 0.12');
+      expect(first?.template).toContain('Resolve `omc.deepInterview.ambiguityThreshold` into `0.12`');
      expect(first?.template).toContain('"threshold": 0.12,');

      writeFileSync(
@@ -365,9 +365,9 @@ describe('Builtin Skills', () => {
      );

      const second = getBuiltinSkill('deep-interview');
-      expect(second?.template).toContain('ambiguityThreshold = 0.33');
+      expect(second?.template).toContain('Resolve `omc.deepInterview.ambiguityThreshold` into `0.33`');
      expect(second?.template).toContain('"threshold": 0.33,');
-      expect(second?.template).not.toContain('ambiguityThreshold = 0.12');
+      expect(second?.template).not.toContain('Resolve `omc.deepInterview.ambiguityThreshold` into `0.12`');
      expect(second?.template).not.toContain('"threshold": 0.12,');
    });

@@ -391,11 +391,9 @@ describe('Builtin Skills', () => {
      expect(t).toContain('"threshold": 0.15,');
      expect(t).toContain('drops below 15%.');

-      // Issue #2545: five previously-missed hardcoded references
-      expect(t).toContain('(default: 15%)');         // Purpose section
-      expect(t).toContain('(default 0.15)');          // Execution_Policy
+      expect(t).toContain('resolved threshold for this run'); // Purpose/Execution_Policy
      expect(t).toContain('Gate: ≤15% ambiguity');    // ASCII pipeline diagram
-      expect(t).toContain('(threshold: 15%).');       // Early-exit example message
+      expect(t).toContain('(threshold: 15%)');        // Early-exit example message
      expect(t).toContain('ambiguity ≤ 15%');         // Advanced pipeline description
      expect(t).toContain('"ambiguityThreshold": 0.15,'); // Advanced config snippet

@@ -408,6 +406,26 @@ describe('Builtin Skills', () => {
      expect(t).not.toContain('"ambiguityThreshold": 0.2,');
    });

+    it('ships a config-aware deep-interview SKILL.md for native skill-loader paths (issue #2723)', () => {
+      const raw = readFileSync(join(originalCwd, 'skills', 'deep-interview', 'SKILL.md'), 'utf-8');
+      expect(raw).toContain('Load runtime settings');
+      expect(raw).toContain('Read `[$CLAUDE_CONFIG_DIR|~/.claude]/settings.json` and `./.claude/settings.json`');
+      expect(raw).toContain('"threshold": <resolvedThreshold>,');
+      expect(raw).toContain('ambiguity drops below <resolvedThresholdPercent>');
+      expect(raw).toContain('Gate: ≤<resolvedThresholdPercent> ambiguity');
+      expect(raw).toContain('"ambiguityThreshold": <resolvedThreshold>,');
+      expect(raw).toContain('At or below the resolved threshold');
+
+      expect(raw).not.toContain('(default: 20%)');
+      expect(raw).not.toContain('(default 0.2)');
+      expect(raw).not.toContain('"threshold": 0.2,');
+      expect(raw).not.toContain('ambiguity drops below 20%');
+      expect(raw).not.toContain('Gate: ≤20% ambiguity');
+      expect(raw).not.toContain('(threshold: 20%).');
+      expect(raw).not.toContain('"ambiguityThreshold": 0.2,');
+      expect(raw).not.toContain('ambiguity ≤ 20%');
+    });
+
    it('rewrites built-in skill command examples to plugin-safe bridge invocations when omc is unavailable', () => {
      process.env.CLAUDE_PLUGIN_ROOT = '/plugin-root';
      process.env.PATH = '';
--- a/src/features/builtin-skills/skills.ts
+++ b/src/features/builtin-skills/skills.ts
@@ -132,14 +132,21 @@ function applyDeepInterviewRuntimeSettings(template: string): string {
  const threshold = getDeepInterviewAmbiguityThreshold();
  const percent = formatThresholdPercent(threshold);

-  return template
-    .replace(
+  const withResolvedPlaceholders = template
+    .replaceAll('<resolvedThreshold>', `${threshold}`)
+    .replaceAll('<resolvedThresholdPercent>', percent);
+
+  const withRuntimeSettings = withResolvedPlaceholders.includes('3.5. **Load runtime settings**:')
+    ? withResolvedPlaceholders
+    : withResolvedPlaceholders.replace(
      '4. **Initialize state** via `state_write(mode="deep-interview")`:',
      [
        `3.5. **Load runtime settings** from \`~/.claude/settings.json\` and \`./.claude/settings.json\` before state init (project overrides profile). For this run, use \`ambiguityThreshold = ${threshold}\`.`,
        '4. **Initialize state** via `state_write(mode="deep-interview")`:',
      ].join('\n'),
-    )
+    );
+
+  return withRuntimeSettings
    .replace('"threshold": 0.2,', `"threshold": ${threshold},`)
    .replace(
      'We\'ll proceed to execution once ambiguity drops below 20%.',