WIKI/DEBUG SWARM TEMPLATE

DEBUG SWARM TEMPLATE

Updated 3 weeks ago
▛▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▜
▌  🐛  THE FORGE — DEBUG SWARM TEMPLATE  v1.2                                  ▐
▙▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▟

Version: 1.2 — 2026-02-23 (post swarm audit — 6 targeted upgrades applied)
Built on: AUDIT_SWARM_TEMPLATE v1.0 (Session 117) + AI Swarm Research (2025-2026)
Research sources incorporated:
  - arXiv:2602.01465 (Agyn, SWE-bench 500) — 72.2% task resolution, validates our 4-role base
  - arXiv:2405.16334 (Devil's Advocate, EMNLP 2024) — 45% plan revision reduction via
    anticipatory reflection. THREE-STAGE LOOP: pre-action → post-action → final review.
  - arXiv:2510.07517 — Identity Bias in Multi-Agent Debate (anonymization protocol)
  - Snorkel AI Self-Critique Paradox — forcing self-critique on >90% confidence DROPS
    accuracy 15-40%. Only run adversarial phase for confidence < 90%.
  - Anthropic multi-agent research (90.2% gain via orchestrator+specialists)
  - IEEE Spectrum "AI Coding Degrades" — validation of DDI, hard iteration cap required

> "The agents are the deliverable. Design them like art." — SoulForge v2.0
> "Launch ALL drones BEFORE planning." — AUDIT_SWARM_TEMPLATE v1.0
> "Sycophancy is structural — anonymize reports or your drones will agree to be wrong."
>   — arXiv:2510.07517 (Identity Bias in Multi-Agent Debate)

---

## WHEN TO USE THIS

Deploy this swarm when:
  ✓ DDI cap hit (2 failed BUILD attempts on the same bug)
  ✓ Bug confidence < 90% (something uncertain in the diagnosis)
  ✓ Silent failure (the system reports OK but isn't)
  ✓ Complex multi-system interaction (>2 Kingdom subsystems involved)
  ✓ You want verification before shipping a significant fix

Do NOT use for:
  ✗ Single-file typo/obvious syntax errors (use PROTOCOL_BUG.md directly)
  ✗ Bugs with 100% confidence and clear fix already documented
  ✗ Tasks that need BUILDING, not debugging

---

## STEP 0 — FILL THE INTAKE (DO THIS FIRST, BEFORE ANYTHING)

Copy this and fill it out. This becomes the BLACKBOARD for all drones.

```
INTAKE:
  SYSTEM:           [e.g. SCRYER, Goldfish, Ping Hub, Overmind, Mailbox]
  BUG_ID:           [e.g. BUG_scryer_sinner_king_entry]
  SYMPTOM:          [Observable failure — what appears, not what you think]
  WHAT_WORKS:       [Parts definitely functioning]
  WHAT_DOESNT:      [Specific failing behavior]
  LAST_GOOD_STATE:  [When it last worked, if ever]
  RECENT_CHANGES:   [Commits/changes in past N sessions]
  PRIOR_ATTEMPTS:   [What was already tried, exact changes made]
  EVIDENCE:         [Logs, signals, probe results already collected]
  KEY_FILES:        [Files you already know are involved]
  INSERTION_SOURCE: [What code/script CREATES the signal/entry? e.g. kingdom-ship.sh, not the watcher]
  CONFIDENCE_GAP:   [What specific uncertainty is keeping this below 100%?]
```

---

## THE DRONE ROSTER (6 drones + 1 observer)

### 🔍 DRONE 1 — THE CODE READER (Haiku)
```
ROLE: Source code archaeologist.
MISSION: Read ALL files for the named system. Build a complete picture of
         what the code DOES, not what it SHOULD do.
OUTPUT (write to _SCRATCHPAD/[BUG_ID]_AUDIT/CODE_READER.md):
  - Full data flow diagram (ASCII is fine)
  - Every external dependency (launchd, DB, API, filesystem path)
  - Every conditional branch that could silently skip
  - Every place where an error is caught and swallowed
  - [HYPOTHESIS: ...] markers for anything suspicious
MODEL: Haiku
ALSO READ: Lifecycle partner files for the system (e.g. scryer-prune.sh alongside
           scryer-dispatch.sh — prune defines data lifecycle, cannot evaluate a
           status-field fix without it). If a file governs cleanup/expiry for the
           system under investigation, it is mandatory.
RULE: Report WHAT. Not WHY. No fixing. No hypotheses about fixes.
DONE WHEN: "STATUS: COMPLETE" at end of report.
```

### 📋 DRONE 2 — THE LOG ANALYST (Haiku)
```
ROLE: Timeline reconstructor.
MISSION: Read all logs, state files, and runtime artifacts.
         Build a timeline of what actually happened.
KEY LOGS TO CHECK (always):
  - ~/Desktop/THE_FORGE/logs/mailbox-trigger.log
  - ~/Desktop/THE_FORGE/FORGE_CLAUDE/logs/ (all .log files)
  - ~/.forge-scryer/logs/ or equivalent
  - ~/Library/Logs/ (launchd exit codes)
  - overmind.db: SELECT * FROM [relevant table] ORDER BY created_at DESC LIMIT 50
OUTPUT (write to _SCRATCHPAD/[BUG_ID]_AUDIT/LOG_ANALYST.md):
  - Chronological timeline of events
  - Every anomaly flagged with timestamp
  - Gaps in expected log entries (silence = signal)
  - Exit codes, failure codes, HTTP status codes observed
  - [HYPOTHESIS: ...] markers for anything suspicious
MODEL: Haiku
RULE: Report WHAT happened. Not WHY. Causation is for synthesis.
ANONYMIZATION: Do NOT mention what other drones might find or expect.
               Write as if you are the only investigator.
DONE WHEN: "STATUS: COMPLETE" at end of report.
```

### 🔴 DRONE 3 — THE LIVE STATE INSPECTOR (general-purpose, needs Bash)
```
ROLE: System pulse-taker.
MISSION: Check what is ACTUALLY running right now.
MORTAL CHECK PROTOCOL (run these first):
  launchctl list | grep [service]          # Is it loaded?
  curl -s localhost:[PORT]/health           # Is it responding?
  ps aux | grep [process]                  # What's actually running?
  lsof -i :[PORT]                          # What owns the port?
  sqlite3 ~/.../overmind.db "SELECT ..."   # What does the DB say?
  grep -r "INSERT INTO [table]" ~/.forge-scryer/ [relevant dirs]  # CONFIRM INSERTION SOURCE
                                           # — find every script that writes to the table/queue.
                                           # Do NOT assume the watcher is the only producer.
OUTPUT (write to _SCRATCHPAD/[BUG_ID]_AUDIT/LIVE_STATE.md):
  - Full process list for all involved services
  - Port occupancy table
  - DB state snapshot (relevant rows)
  - Registry/state file contents (AERIS_SHARED_STATE.json, etc.)
  - Queue depths (buffer counts, pending rows)
  - Any EXIT CODES from launchctl list (non-zero = problem)
  - [HYPOTHESIS: ...] markers for anything suspicious
MODEL: general-purpose (Bash access required)
RULE: Report current state. Not what it should be. Not what you'd fix.
ANONYMIZATION: Do NOT factor in what logs or code might suggest.
               Approach system state as if seeing it for the first time.
DONE WHEN: "STATUS: COMPLETE" at end of report.
```

### 🌐 DRONE 4 — THE PATTERN RESEARCHER (Explore or general-purpose)
```
ROLE: External knowledge base.
MISSION: Find documented failure modes for the technologies involved.
         Validate that our understanding of the system is correct.
ALWAYS SEARCH FOR:
  - "[technology] [symptom] known issue"
  - "[technology] silent failure"
  - "[technology] launchd KeepAlive behavior"
  - "[exact error message or exit code]"
  - The specific gotchas in DEBUGGING_PROTOCOL.md Kingdom-specific section
ALSO CHECK INTERNAL DOCS:
  - CORE LORE/SYSTEMS/SYS_[system].md
  - CORE LORE/SINNER_KING_DEBUGGING_PROTOCOL.md Part V
  - MEMORY.md GOTCHAS section
OUTPUT (write to _SCRATCHPAD/[BUG_ID]_AUDIT/PATTERN_RESEARCH.md):
  - Known failure modes ranked by match to observed symptom
  - Any Kingdom-specific gotchas from internal docs
  - Any "this is documented behavior" findings (not bugs)
  - [HYPOTHESIS: ...] markers for anything suspicious
MODEL: general-purpose (web search + file read)
RULE: External knowledge only. Do not try to read the actual code.
ANONYMIZATION: Write this report as if you haven't seen the code or logs.
               Pure research perspective.
DONE WHEN: "STATUS: COMPLETE" at end of report.
```

### 🧬 DRONE 5 — THE ADVERSARIAL TESTER (general-purpose, needs Bash)
```
ROLE: Fence-toucher. The one who actually runs the code.
MISSION: Reproduce the failure. Find the minimal trigger.
PROTOCOL:
  1. Start from CLEAN STATE (restart service if safe to do so)
  2. Run the MORTAL CHECK first
  3. Inject TIER 1 FENCES (console.log / echo probes) at suspected locations
  4. Trigger the bug deliberately
  5. Report EXACTLY what appeared and what didn't
FENCE SIGNALS:
  🔵 = Code path executed
  🦖 = Bug found (raptor spotted)
  🟢 = Success confirmed
  🔴 = Failure confirmed
OUTPUT (write to _SCRATCHPAD/[BUG_ID]_AUDIT/ADVERSARIAL_TEST.md):
  - Exact steps to reproduce
  - Which fences fired and which didn't
  - Minimal reproduction case
  - BLAST RADIUS COUNT: exact number of times the bad behavior occurred (log events,
    DB rows, API calls, phone pushes). This number is used for fix verification —
    after fix, the count should be zero on the same trigger.
  - Side effects observed
  - [CONFIRMED: ...] markers for verified facts
MODEL: general-purpose (Bash access required)
RULE: You are the ONLY drone that is ALLOWED to form conclusions.
      Every other drone reports facts. You report what the spark says.
      But: your conclusions must be falsifiable. Name what would disprove them.
DONE WHEN: "STATUS: COMPLETE" at end of report.
```

### 🔬 DRONE 6 — THE INDEPENDENT REVIEWER (general-purpose or Haiku)
```
ROLE: The second pair of eyes. Writer/Reviewer split enforced structurally.
MISSION: Read ONLY the original bug report and the diff/code in question.
         NOT the other drones' reports. NOT the prior diagnosis.
         Form a FRESH diagnosis as if this is the first time anyone has seen this bug.
WHY: The agent that proposed the fix cannot validly review it.
     Same context = same blind spots = confirmation bias.
     This drone is initialized fresh to prevent sycophancy (arXiv:2510.07517).
MANDATORY FIRST STEP — PRIOR FIX CHECK:
  ls -la [key files in bug report]   # Check mtime vs bug report filed_at.
  # If a key file was modified AFTER the bug was filed, the script may already
  # be partially patched. Acknowledge this explicitly before diagnosing.
  # "This script was last modified [time] — [before/after] the bug was filed."
  # A "no fix needed" conclusion from code that was patched mid-session is not wrong —
  # but you must check for Phase 2 (partial fix leaving active sub-bug) before closing.
INPUT: Original BUG_REPORT + relevant code files + ONE RUNTIME LOG (dispatch.log or
       equivalent). The single log is required to see actual runtime behavior, not just
       intended code behavior. Do NOT give this drone access to other drone reports.
OUTPUT (write to _SCRATCHPAD/[BUG_ID]_AUDIT/INDEPENDENT_REVIEW.md):
  - Fresh diagnosis (what YOU think the root cause is)
  - Fresh fix proposal (what YOU would change)
  - Gaps you see in the prior diagnosis (if any)
  - Anything that looks like a false positive
MODEL: general-purpose
RULE: You are BLIND to the other drones. This is a feature, not a bug.
DONE WHEN: "STATUS: COMPLETE" at end of report.
```

### 👁️ DRONE 7 — THE FLASH OBSERVER (Haiku, run in background)
```
ROLE: Quality auditor. Runs in background during recon.
MISSION: Monitor all drone outputs as they complete.
         Grade each one. Block synthesis if any scores below B.
GRADING (write to _SCRATCHPAD/[BUG_ID]_AUDIT/FLASH_OBSERVER.md):
  A = Complete. Thorough. Conclusions match evidence.
  B = Mostly complete. Minor gaps. Reasonable.
  C = Significant gaps. Key files not checked.
  D = Incomplete. Major aspects missed.
  F = Wrong direction. Dangerous conclusions.
GRADE PER CATEGORY:
  1. Coverage (did they check everything relevant?)
  2. Evidence Quality (conclusions backed by actual observations?)
  3. False Positives (reporting irrelevant things as suspicious?)
  4. Completeness (stopped early?)
  5. Actionability (is the output usable for synthesis?)
VERDICT: PROCEED (all ≥ B) or LOOP (any < B, with specific remediation task)
MODEL: Haiku
NOTE: Costs almost nothing. Never skip this drone.
DONE WHEN: "STATUS: COMPLETE" + VERDICT at end of report.
```

---

## LAUNCH SEQUENCE

```
# MANDATORY: Fill INTAKE above before running this.
# MANDATORY: ALL drones in ONE message. No exceptions.

SCRATCHPAD_DIR = _SCRATCHPAD/[BUG_ID]_AUDIT/
mkdir -p that dir first.

Launch in ONE message (7 Task tool calls):
  Task(subagent_type="general-purpose", run_in_background=True)  ← DRONE 1 CODE READER
  Task(subagent_type="general-purpose", run_in_background=True)  ← DRONE 2 LOG ANALYST
  Task(subagent_type="general-purpose", run_in_background=True)  ← DRONE 3 LIVE STATE
  Task(subagent_type="general-purpose", run_in_background=True)  ← DRONE 4 RESEARCHER
  Task(subagent_type="general-purpose", run_in_background=True)  ← DRONE 5 ADVERSARIAL
  Task(subagent_type="general-purpose", run_in_background=True)  ← DRONE 6 REVIEWER (blind)
  Task(subagent_type="general-purpose", run_in_background=True)  ← DRONE 7 FLASH OBSERVER

WAIT for ALL 7 to report STATUS: COMPLETE before reading ANY output.
The Blind Dragon planning without eyes = guessing with extra steps.
```

---

## ANONYMIZATION PROTOCOL (from arXiv:2510.07517)

When passing drone reports to synthesis, STRIP agent identity markers.
Do not say "Drone 2 found X." Say "The log analysis shows X."
This prevents the synthesis agent from discounting findings based on
which drone found them (identity-driven sycophancy in multi-agent debate).

Practical: copy each drone's output into synthesis prompt as plain text,
not as "Drone 1 says..." but as anonymous data blocks.

---

## SYNTHESIS PROTOCOL (after all 7 complete)

READ ORDER: CODE_READER → LOG_ANALYST → LIVE_STATE → PATTERN_RESEARCH →
            ADVERSARIAL_TEST → INDEPENDENT_REVIEW (last — fresh eyes)

1. FIND THE CONTRADICTIONS
   The contradiction between reports IS usually where the bug lives.
   Example (RAVEN): "Code says pane_id defaults to empty string" vs
                    "Registry shows pane_id=0" → The FALSY ZERO bug.

2. COMPARE INDEPENDENT_REVIEW vs prior diagnosis
   If the fresh reviewer disagrees: take that seriously.
   If they agree: confidence goes up.

3. THERAPY LOOP on top hypothesis
   Name 2 ways it could be wrong.
   Name the evidence that would disprove it.
   If you can't falsify it, it's not a hypothesis — it's a story.

4. WRITE SYNTHESIS.md
   ## ROOT CAUSE (best hypothesis)
   ## CONFIDENCE (0-100%)
   ## EVIDENCE (bullet list, anonymized)
   ## WHAT WOULD DISPROVE THIS
   ## SECONDARY ISSUES FOUND
   ## FIX PLAN (surgical, file:line specific)
   ## FENCE TESTS TO VERIFY

---

## KINGDOM SYSTEM QUICK REFERENCE

For each subsystem, pre-configured check commands:

**SCRYER:**
  Watcher: launchctl list com.scryer.watcher
  Queue: sqlite3 ~/.../overmind.db "SELECT COUNT(*) FROM signals_queue WHERE dispatching='pending'"
  Dispatch lock: ls -la ~/.forge-scryer/.dispatch.lock
  Logs: tail -50 ~/.forge-scryer/scryer-dispatch.log

**Overmind Pulse:**
  Pulse: launchctl list com.overmind.pulse
  DB: sqlite3 overmind.db "SELECT id,status,consecutive_failures FROM missions"
  Log: tail -50 ~/Desktop/THE_FORGE/FORGE_CLAUDE/logs/pulse.log

**Goldfish:**
  Process: launchctl list com.forge.goldfish
  Moondream: curl -s localhost:11434/api/tags | python3 -m json.tool
  Log: tail -30 ~/.forge-sensor/goldfish.log

**Ping Hub:**
  Health: curl -s localhost:7776/health
  Process: launchctl list com.forge.ping-ui
  Plist: cat ~/Library/LaunchAgents/com.forge.ping-ui.plist

**Mailbox:**
  Trigger: launchctl list com.forge.mailbox
  Buffer: ls -la ~/Desktop/THE_FORGE/@FORGE_CLAUDE_MAILBOX/buffer/
  Log: tail -30 ~/Desktop/THE_FORGE/FORGE_CLAUDE/logs/mailbox-trigger.log

**MCP Agent Mail:**
  Health: curl -s localhost:8765/api/ -H "Authorization: Bearer [token]"
  Process: launchctl list com.mcp-agent-mail
  Log: tail -30 ~/Library/Logs/com.mcp-agent-mail.log 2>/dev/null

---

## NAMED MONSTERS QUICK REFERENCE (from DEBUGGING_PROTOCOL.md)

If your symptom matches one of these, you already have the cure:

  👻 THE GHOST       — leftover code from previous era, still loaded
  💀 THE DOPPELGANGER — hidden duplicate shadowing the real thing
  🎭 THE MIME        — logs success, behavior unchanged (hallucinated fix)
  🌊 THE MIRAGE      — cached old state while Dragon reports success
  🎭 THE FALSY ZERO  — `not 0 == True` guard condition eating valid zero
  👻 THE PHANTOM API — AI hallucinated a CLI arg/function that doesn't exist
  🕳️ THE VOID        — fire-and-forget async task garbage collected
  🧱 THE AMNESIAC    — state forgets on re-render
  👁️ THE WARDEN      — safety filter blocking legitimate action

---

## SCRATCHPAD STRUCTURE (per bug)

```
_SCRATCHPAD/[BUG_ID]_AUDIT_[DATE]/
  CODE_READER.md
  LOG_ANALYST.md
  LIVE_STATE.md
  PATTERN_RESEARCH.md
  ADVERSARIAL_TEST.md
  INDEPENDENT_REVIEW.md
  FLASH_OBSERVER.md        ← PROCEED or LOOP verdict
  SYNTHESIS.md             ← Written after all 7 complete
  FIX_PLAN.md              ← Written after synthesis
  VERIFICATION.md          ← Written after fix applied
```

After successful fix:
  1. Compress scratchpad → AUDIT_ARCHIVE.md (synthesis + flash observer + 1 paragraph per drone)
  2. File AUDIT_ARCHIVE.md into the BUG chamber alongside SUMMARY.md
  3. Delete the raw 7-file scratchpad dir (Vaporize Protocol — raw files only)
  4. Chamber now contains full reasoning chain at ~10KB, not just the fix spec.
NOTE: Do NOT vaporize without compressing first. The reasoning chain is the institutional
memory. The raw files are what you delete — not the reasoning.

---

## THE GATES (when to fire <promise>COMPLETE</promise>)

  GATE 1 MECHANICAL: fence probe fires, bug can't be reproduced
  GATE 2 ADVERSARIAL: ADVERSARIAL TESTER can't find remaining failure mode
  GATE 3 PHILOSOPHICAL: 30-second explanation possible, plaque written, probes removed

All three gates = <promise>COMPLETE</promise>.
One gate = keep going.

---

## QUICK LAUNCH CHECKLIST

  □ Intake filled (all 11 fields — including INSERTION_SOURCE)
  □ mkdir -p _SCRATCHPAD/[BUG_ID]_AUDIT/
  □ 7 Task calls in ONE message
  □ Waited for ALL STATUS: COMPLETE
  □ Flash Observer: PROCEED verdict (all ≥ B)
  □ Anonymized reports fed to synthesis
  □ Contradictions found and marked
  □ Therapy Loop run on top hypothesis
  □ Independent Review compared to synthesis
  □ Fence tests placed BEFORE fix
  □ Fix applied surgically
  □ All 3 gates verified
  □ Archaeological Plaque written
  □ Probe code removed
  □ Chamber filed to BUG_STATION/BUG_[name]/
  □ AUDIT_ARCHIVE.md compressed and filed to chamber
  □ _SCRATCHPAD raw files vaporized (NOT before AUDIT_ARCHIVE.md is filed)
  □ mailbox-trigger.log updated

---

*"Touch the fence. Trust the spark. Name the monster. Leave the plaque."*

🜚 Version 1.2 | 2026-02-23 | Built for THE FORGE Kingdom
   v1.1: Devil's Advocate 3-stage loop, Self-Critique Paradox, anonymization (research)
   v1.2: +INSERTION_SOURCE intake field, +lifecycle partner reads (D1), +CONFIRM INSERTION
         SOURCE (D3 MORTAL CHECK), +BLAST RADIUS COUNT (D5), +PRIOR FIX CHECK (D6),
         +ONE RUNTIME LOG to D6 scope, +AUDIT_ARCHIVE.md replaces raw vaporize
         Swarm audit grade: B+ | System grade: A- | Test bug: BUG_scryer_sinner_king_entry