WIKI/AUDIT SWARM TEMPLATE

CORE LORE / WIKI

AUDIT SWARM TEMPLATE

Updated 1 month ago

# AUDIT SWARM TEMPLATE
**Version:** 1.0 — Session 117 — 2026-02-23
**Authors:** Brandon McCormick + Claude
**Classification:** CORE LORE — REUSABLE TEMPLATE

---

> *"Launch ALL drones BEFORE planning. Wait for ALL before logging. The Blind Dragon planning without eyes is not planning — it's guessing with extra steps."*

---

## WHAT THIS IS

A reusable template for deploying an AI audit swarm to investigate any Kingdom system. Copy this file, fill in the system-specific sections, and launch.

**Use when:**
- A system is broken and the root cause is unknown
- You need comprehensive coverage before building a fix
- Solo debugging has failed twice (DDI cap hit — time to bring the swarm)
- A system needs pre-deployment verification

**The promise:** When the swarm completes, you will know WHERE the bug is (not just that a bug exists), WHY it's there (root cause, not symptom), and WHAT the fix is (specific, testable).

---

## PART 1: INTAKE

Fill this before launching any drones.

```
SYSTEM: [name of system being audited]
SYMPTOM: [observable failure — what you see, not what you think is broken]
WHAT WORKS: [what parts of the system definitely function]
WHAT DOESN'T: [specific failing behavior]
LAST KNOWN GOOD STATE: [when/if it ever worked]
RECENT CHANGES: [anything changed in the past N sessions]
PRIOR ATTEMPTS: [what has already been tried — be specific about what was changed and what was observed]
EVIDENCE SO FAR: [any logs, signals, or probe results already collected]
```

---

## PART 2: DRONE CONFIGURATION

### Standard Swarm (4 drones)

**DRONE 1 — THE CODE READER**
```
Role: Read all source files for the system.
Mission: Understand the code as-is. No assumptions. No guessing.
Output: Full architecture diagram + list of every external dependency + list of every potential failure point.
Model: Haiku (cheap, just reading)
Special instruction: Do NOT form a hypothesis yet. ONLY report what the code does.
```

**DRONE 2 — THE LOG ANALYST**
```
Role: Read all available logs, state files, and runtime artifacts.
Mission: Find discrepancy between expected behavior and logged behavior.
Output: Timeline of what actually happened. Key anomalies highlighted.
Model: Haiku (cheap, just reading)
Special instruction: Report WHAT happened, not WHY. Causation comes in synthesis.
```

**DRONE 3 — THE LIVE STATE INSPECTOR**
```
Role: Check the running system's current state.
Mission: Verify what is actually running, registered, queued, connected.
Output: Complete live state snapshot — ports, PIDs, registry contents, queue depths, connection states.
Model: Haiku (needs Bash access — use general-purpose subagent_type)
Special instruction: Run the MORTAL CHECK first. If the system isn't running, report and stop.
Commands to check:
- process health (curl /health, ps aux | grep [process])
- registry/state files (cat the JSON)
- queue depths (GET /status)
- port occupancy (lsof -i :PORT)
```

**DRONE 4 — THE PATTERN RESEARCHER**
```
Role: Research the known failure modes for this system type.
Mission: Find documented bugs, gotchas, and known issues for the technologies in use.
Output: List of known failure modes for [TECHNOLOGY STACK] + most likely matches to observed symptom.
Model: Sonnet (needs web research)
Special instruction: Search for the SPECIFIC symptom, not the general technology.
Searches to run:
- "[technology] [symptom description] silent failure"
- "[technology] background task not running"
- "[technology] [specific behavior] bug"
```

---

### Extended Swarm (add these for complex systems)

**DRONE 5 — THE FORENSIC ARCHAEOLOGIST**
```
Role: Read the git history and changelog for recent changes.
Mission: Find the commit that introduced the regression, if any.
Output: Timeline of relevant changes. Hypothesis about which change broke the system.
Model: Haiku
Special instruction: Focus only on changes to files that touch the failing component.
```

**DRONE 6 — THE ADVERSARIAL TESTER**
```
Role: Actively try to reproduce the failure.
Mission: Run the system with the smallest possible trigger and observe what breaks.
Output: Minimal reproduction case. Exact conditions required to trigger the bug.
Model: Sonnet (needs Bash)
Special instruction: Start from fresh state. Do not assume prior understanding. Touch the fence.
```

**DRONE 7 — THE FLASH OBSERVER**
```
Role: Watch all other drones and grade their work.
Mission: Catch errors, omissions, and false conclusions before synthesis.
Output: Grade (A-F) per category + list of gaps in coverage.
Model: Haiku (runs in background, very cheap)
Categories to grade:
A = complete, thorough, no gaps
B = mostly complete, minor gaps
C = significant gaps, needs follow-up
D = incomplete, major issues missed
F = wrong direction or dangerous conclusions
Trigger: Swarm loops if any drone scores below B.
```

---

## PART 3: LAUNCH SEQUENCE

**RULE: Launch ALL drones in ONE message. Do NOT wait for any drone before launching others.**

```markdown
# Launching [SYSTEM] audit swarm

Launching 4 background drones simultaneously:
- Task(subagent_type="general-purpose", run_in_background=True): DRONE 1 CODE READER
- Task(subagent_type="general-purpose", run_in_background=True): DRONE 2 LOG ANALYST
- Task(subagent_type="general-purpose", run_in_background=True): DRONE 3 LIVE STATE
- Task(subagent_type="Explore", run_in_background=True): DRONE 4 RESEARCHER

Then: wait for ALL to complete before reading ANY output.
```

**Why simultaneous?** The Recon-First Mandate. Information gathered before hypothesis formation is signal. Information gathered after hypothesis formation is confirmation bias.

---

## PART 4: OUTPUT FILE STRUCTURE

Each drone writes to a structured output file. Standard paths:

```
_SCRATCHPAD/[SYSTEM]_AUDIT_[DATE]/
CODE_READER.md ← Drone 1 output
LOG_ANALYST.md ← Drone 2 output
LIVE_STATE.md ← Drone 3 output
PATTERN_RESEARCH.md ← Drone 4 output
SYNTHESIS.md ← Written by synthesis agent after all drones complete
FIX_PLAN.md ← Written after synthesis, before any code changes
VERIFICATION.md ← Written after fix applied, confirms resolution
```

---

## PART 5: DRONE PROMPT TEMPLATE

Each drone gets this exact header before its specific mission:

```markdown
You are DRONE [N] — [ROLE] — in a [SYSTEM] audit swarm.

SYSTEM UNDER AUDIT: [name]
SYMPTOM: [exact failure description]
YOUR MISSION: [drone-specific mission from Part 2]

RULES:
1. Write your output to: _SCRATCHPAD/[AUDIT_FOLDER]/[FILENAME].md
2. Do NOT form a final diagnosis — that happens in synthesis.
3. Report WHAT you find, not what it means.
4. If you find something that might be the root cause, mark it: [HYPOTHESIS: description]
5. Close your report with: STATUS: COMPLETE

BLACKBOARD (shared state — read before starting):
[paste INTAKE section here]

Begin immediately.
```

---

## PART 6: SYNTHESIS PROTOCOL

After ALL drones report COMPLETE:

```
STEP 1: READ ALL REPORTS IN ORDER
Code Reader → Log Analyst → Live State → Pattern Research
Do NOT start forming hypotheses until all four are read.

STEP 2: FIND THE CONTRADICTIONS
Where do the reports disagree?
The contradiction is usually where the bug lives.
Example: "Code says pane_id defaults to empty string" vs "Registry shows pane_id=0"
→ The contradiction IS the bug.

STEP 3: LIST ALL HYPOTHESES
Write down every possible explanation, ranked by:
- Evidence strength (how much evidence supports it)
- Falsifiability (how easily can it be disproved)
Pick the MOST FALSIFIABLE one first, not the most plausible.

STEP 4: RUN THERAPY LOOP
For each top hypothesis:
- Name two ways this hypothesis could be wrong
- Name the evidence that would disprove it
Discard any hypothesis you cannot falsify.

STEP 5: WRITE SYNTHESIS.md
Format:
## ROOT CAUSE (best hypothesis)
## CONFIDENCE (0-100%)
## EVIDENCE (bullet list)
## WHAT WOULD DISPROVE THIS
## SECONDARY ISSUES (other things found)
## FIX PLAN (surgical, specific)
## FENCE TESTS TO VERIFY FIX
```

---

## PART 7: FIX EXECUTION TEMPLATE

```markdown
## PRE-FIX VERIFICATION
Before changing any code:
- [ ] Can I reproduce the bug with the original trigger?
- [ ] Does the fence test confirm the location?
- [ ] Has THERAPY LOOP been run on this hypothesis?

## CHANGES PLANNED
[Specific file:line changes, as surgical as possible]

## FENCE TESTS THAT WILL CONFIRM THE FIX
[Exactly what to run and what signal to expect]

## REGRESSION RISK
[What else could break from these changes?]
```

---

## PART 8: VERIFICATION PROTOCOL

After fix is applied:

```
STEP 1: MECHANICAL GATE
- Run the fence test. Does it fire correctly?
- Run the original failure trigger. Is the bug gone?
- Run any adjacent functionality. Did we break anything?

STEP 2: ADVERSARIAL GATE
- Can the ADVERSARIAL TESTER find any remaining failure mode?
- If yes: loop back to SYNTHESIS. The fix is incomplete.

STEP 3: PHILOSOPHER'S GATE
- Can you explain what was broken to someone reading this in 6 months?
- Is the Archaeological Plaque written?
- Are all probe files removed?
- Are you proud of this fix?

<promise>COMPLETE</promise> fires ONLY when all three gates are open.
```

---

## PART 9: FLASH OBSERVER PROMPT TEMPLATE

If deploying Drone 7 (Flash Observer), launch with this prompt:

```markdown
You are the FLASH OBSERVER for the [SYSTEM] audit swarm.

Your job is to watch ALL other drone outputs as they come in and grade them A-F.

Grades:
A = Complete. All relevant files read. No obvious gaps. Conclusions match evidence.
B = Mostly complete. Minor gaps. Conclusions reasonable.
C = Significant gaps. Key files or logs not checked. Conclusions may be premature.
D = Incomplete. Major aspects of the mission not covered.
F = Wrong direction. Dangerous or clearly incorrect conclusions.

After all drones report, file a single FLASH_OBSERVER_REPORT.md with:
- Grade per drone
- Specific gaps found
- Whether synthesis can proceed or needs additional investigation
- Your overall assessment: PROCEED or LOOP

If any drone scores below B: LOOP — send specific follow-up missions back to that drone.
If all drones score B or better: PROCEED — synthesis can begin.

Grading categories:
1. Coverage (did it check everything it should?)
2. Evidence Quality (are conclusions backed by actual evidence?)
3. False Positives (did it report things that aren't actually relevant?)
4. Completeness (did it finish, or did it stop early?)
5. Actionability (is the output usable for synthesis?)
```

---

## REAL EXAMPLE: RAVEN DRAIN LOOP AUDIT (Session 117)

**Intake:**
```
SYSTEM: zellij-relay (RAVEN relay)
SYMPTOM: Messages queued, never dispatched. queue_depth=2, last_dispatch=null after hours.
WHAT WORKS: /health returns 200, /register accepts pane registration, /dispatch accepts messages
WHAT DOESN'T: No messages ever delivered to Claude's terminal
LAST KNOWN GOOD STATE: Never worked in v0.3.0
PRIOR ATTEMPTS: idle patterns fixed (❯\s*$, esc to cancel/interrupt sorted), STUCK_BUSY recovery added
EVIDENCE: relay.log shows "Queued" entries but zero "Dispatched" entries. Ever.
```

**Findings:**
- Code Reader: found `if not session or not pane_id: continue` at line 143
- Live State Inspector: found `"pane_id": 0` (integer zero) in registry
- Pattern Researcher: confirmed Python `not 0 == True`
- Synthesis: FALSY ZERO bug. One line. 100% confidence.
- Secondary: `esc to interrupt` too broad, `focus-terminal-pane` nonexistent

**Fix:** 3 surgical changes. Applied in 10 minutes. Ravens flew same session.

**The lesson:** The Recon-First Mandate works. The Code Reader found the one-line bug that had been invisible for weeks because no previous analysis had actually checked what `pane_id` evaluates to in the guard condition.

---

## QUICK REFERENCE CARD

```
AUDIT SWARM CHECKLIST:
□ Intake filled
□ All drones configured
□ ALL drones launched in ONE message (simultaneous)
□ Waited for ALL reports before reading ANY
□ Contradictions found and marked
□ Therapy Loop run on top hypothesis
□ Synthesis written before any code changes
□ Fence tests placed before fix
□ Fix applied surgically
□ All three gates verified (Mechanical + Adversarial + Philosophical)
□ Archaeological Plaque written
□ Probe code removed
□ <promise>COMPLETE</promise> fired
```

---

*"Launch ALL drones BEFORE planning. The Blind Dragon planning without eyes is not planning — it's guessing with extra steps."*

⛬ KID:CORE:DOCTRINE:AUDIT-SWARM|1.2:LIVE:2026-02-25:⋈ ⛬

← Back to Kingdom Wiki