DRONE ARCHITECT 2026 03 06

Updated 1 month ago
# DRONE-ARCHITECT FINDINGS REPORT
## Kingdom AI-Native Redesign Audit
### Date: 2026-03-06 | Source files reviewed: pulse.sh (666 lines), raven.py, TINY_BOTS_KINGDOM_AUDIT_2026-03-05.md, mechanisms/core/ listing

---

## PART 1: Top 5 Highest-Leverage Decomposition Opportunities

### 1. pulse.sh Mission Loop — AI Readiness Reasoner (LEVERAGE: CRITICAL)

The pulse.sh mission loop (lines 191–600+) currently does all dispatch decisions in bash arithmetic and SQL COUNT() comparisons: consecutive_failures >= max_failures, estimated_this_hour >= token_budget. These are hard thresholds that cannot reason about context. The loop also handles ALL missions sequentially in a single bash process — if one Gemini call hangs for 300s (GEMINI_TIMEOUT), every other due mission waits behind it.

**Decomposition target:** Extract the readiness check into a `pulse-ready-reasoner` bot that gets called per-mission before dispatch. Input: mission row as JSON + last 5 pulse_log rows + last TICK_STATE. Output: `{"dispatch": true/false, "reason": "...", "risk": "low|medium|high"}`. A local `gemma3:4b` call (3-8s) answers the question: "Given this mission's history and current Kingdom state, should it run right now?"

This unlocks something bash cannot do today: **skip a mission not because of a hard counter threshold, but because the previous 3 ticks all ended in ACTION_NEEDED and the pattern suggests Brandon is unreachable overnight.**

### 2. RAVEN Ingest — Priority Classifier (LEVERAGE: HIGH)

raven.py's `_process_file()` currently trusts the PRIORITY field declared in the envelope. smollm2:360m (280MB, sub-second on Apple Silicon) can independently classify each message's actual urgency from subject + first 200 chars. This is 50 lines of Python added to `router.py` before the INSERT.

Current failure mode: pulse.sh drops IMPORTANT for token budget events, URGENT for circuit breakers — but there is no cross-message reasoning. A circuit breaker that fires 4 times in 2 hours should escalate to a different delivery channel than one that fires once. The classifier can see the DB history and escalate adaptively.

**Model:** `smollm2:360m`. Integration point: `router.py` line ~40 (before db INSERT). Risk: zero — classifier result goes into a new `ai_priority` column, declared PRIORITY used as fallback.

### 3. Dead-Letter Accumulation — Pattern Detector (LEVERAGE: HIGH)

The git status shows 100+ `BUG_RAVEN_dead-letter` files in `@FORGE_CLAUDE_MAILBOX/processed/`, all from 2026-03-05 at 1-2 minute intervals for ~10 hours. This is the single most visible Kingdom failure pattern: a bot enters a failure loop, drops a RAVEN alert every interval, and the mailbox becomes a dead-letter graveyard.

Today there is no system that looks at this pattern and says "something is in a loop." The loop-check daemon (`com.forge.loop-check`) runs every 3 hours — far too slow to catch a 90-minute flood.

**Decomposition target:** `mailbox-flood-detector` bot (cron every 5 min). Reads: count of files in processed/ with same prefix in last 60 min. If >10 identical-prefix files, call `gemma3:4b` with the last 3 to answer: "Is this a failure loop or legitimate burst traffic?" Drop URGENT RAVEN if loop detected. Build time: 30 minutes.

### 4. SCRYER Stream Summaries — Local Inference (LEVERAGE: HIGH)

SCRYER runs 23 streams. `summarize-stream.py` currently calls the Gemini API for each stream summary. From `AERIS_SHARED_STATE.json`, pulse.sh already pulls `scryer_synthesis.path` and injects the result into every mission prompt — meaning SCRYER summaries cost API tokens twice (once to generate, once to consume in pulse).

`gemma3:4b` is installed and idle. Stream inputs are 500-2000 tokens → 100-300 token outputs. Local inference: 3-8 seconds per stream. 92 Gemini API calls/day → $0. The only change needed is swapping the API call in `summarize-stream.py` for an `ollama run gemma3:4b` subprocess call.

**Risk:** Near-zero. SCRYER summaries are internal orientation data, not user-facing.

### 5. Goldfish Vision Pipeline — Model Tier Split (LEVERAGE: MEDIUM-HIGH)

`llava:latest` (4.7GB) is used for Herald narration. `moondream:1.8b` is already installed and sub-second on Apple Silicon.

**Decomposition:** Two-tier pipeline. `moondream:1.8b` fires every 5 minutes on new PNGs → 1-line screen label to TIMESTREAM. `llava:latest` (or `qwen2.5vl:3b` after pull) fires every 30 minutes for quality Herald narration. Fast tier gives near-real-time screen awareness at near-zero cost; slow tier gives depth.

---

## PART 2: New Capabilities That Become Possible With Reasoning Bots

These are **structurally impossible** with pure bash logic today:

**1. Semantic circuit breaking.** The current circuit breaker trips on `consecutive_failures >= max_failures`. A local reasoner can trip on: "Aeris's last 3 TICK_STATEs all describe the same unresolvable ambiguity." The bash loop detector (Guard 2, lines 477-516) uses SHA256 hash matching — catches exact repeats, not semantic ones. `gemma3:4b` semantic similarity check catches: different wording, same stuck state. This is the highest-leverage AI reasoning addition to existing pulse.sh.

**2. Overnight spend prediction.** Kingdom spends 50%+ overnight. Today no system says "based on last 7 Tuesdays, tonight will cost $X — throttle?" A `spend-predictor` bot at 22:00 daily can answer this and auto-set the pulse rate limit in `user_settings`.

**3. Anomaly detection without thresholds.** Every monitoring bot today asks "is metric X above threshold Y?" A reasoning bot asks "is this reading unusual given recent history?" — works even when the normal baseline has shifted.

**4. Cross-system failure correlation.** If RAVEN dead-letter count spikes AND Goldfish narration timestamps go stale AND pulse log shows TIMEOUT on the same mission simultaneously — today no system correlates these. A `kingdom-health-synthesizer` bot reads all three.

**5. Mission health narrative.** With a local model, `manage.sh status` could output: "M005 has been stuck on the same subtask for 3 ticks. The last TICK_STATE suggests it's waiting for a file that doesn't exist at the expected path." That's the difference between a dashboard and a collaborator.

---

## PART 3: The Kingdom Nervous System — A Design

```
[DATA LAYER — unchanged]
  overmind.db / raven.db / sentinel.db / timestream.db
        |
[INGEST BOTS — fire once, write one row, exit]
  pulse-ready-reasoner    → gemma3:4b → dispatch decision JSON
  raven-priority-clf      → smollm2:360m → ai_priority column
  scryer-summarizer       → gemma3:4b → stream_summary files
  goldfish-fast-pass      → moondream:1.8b → screen_label TIMESTREAM row
        |
[WATCHDOG BOTS — cron, read DB, drop RAVEN]
  mailbox-flood-detector  → gemma3:4b → loop vs. burst classification
  spend-predictor         → gemma3:4b → overnight budget recommendation
  mission-health-bot      → gemma3:4b → "M005 is semantically stuck" narrative
        |
[SYNTHESIS LAYER — daily, reads all streams]
  kingdom-health-synthesizer → gemma3:4b → cross-system failure correlation
        |
[DELIVERY — unchanged]
  RAVEN → Console NEWS → Aeris mailbox
```

Key design principles:
- Every bot is stateless. State lives in SQLite.
- The reasoning layer is **advisory, not authoritative** — bash logic remains the gate, AI output is a DB field.
- Local models only. No external API calls in watchdog bots.
- `gemma3:4b` + `smollm2:360m` cover every use case. Gemini API reserved for Aeris mission work.

---

## PART 4: Where Local Models Slot in TODAY with Minimal Risk

| Slot | Model | Integration Point | Risk | Effort |
|------|-------|-------------------|------|--------|
| SCRYER stream summarization | `gemma3:4b` | Replace API call in `summarize-stream.py` | Near-zero | 1-2 hours |
| RAVEN priority classification | `smollm2:360m` | New column + 50 lines in `router.py` | Zero (additive) | 1-2 hours |
| Goldfish fast-pass screen labels | `moondream:1.8b` | New launchd QueueDirectory watcher | Zero | 2-3 hours |
| Mailbox flood detection | `gemma3:4b` | New cron bot, reads processed/ dir | Zero | 1.5 hours |
| Mission readiness reasoner | `gemma3:4b` | New subprocess call in pulse.sh pre-dispatch | Low (advisory) | 3-4 hours |

All models installed except `smollm2:360m` — needs `ollama pull smollm2:360m` (280MB).

---

## PART 5: Minimum Viable MVP

**Recommendation: SCRYER local summarization switch.**

Single file change in `summarize-stream.py`: replace one Gemini API call with `subprocess.run(["ollama", "run", "gemma3:4b", prompt])`. Deploy overnight. Next morning: check 23 stream summaries generated, compare quality to Gemini baseline from prior day.

Why this first: completely reversible, saves ~92 Gemini Flash calls/day → $0, proves local inference pipeline end-to-end, quality difference immediately visible in Aeris mission outputs.

**Second MVP:** smollm2:360m RAVEN classifier. Pull model (5 min), add 50 lines to router.py, deploy. The `ai_priority` column becomes queryable — immediately see every case where declared vs. inferred priority diverged.

---

| Finding | Leverage | Effort | Model | Risk |
|---------|----------|--------|-------|------|
| SCRYER local summarization | $$$+quality | Low | gemma3:4b | Near-zero |
| RAVEN priority classifier | High UX | Low | smollm2:360m | Zero |
| Goldfish two-tier vision | Resilience | Medium | moondream:1.8b | Zero |
| Mailbox flood detector | Ops stability | Low | gemma3:4b | Zero |
| Semantic loop detection | Reliability | Medium-High | gemma3:4b | Low |
| Overnight spend prediction | $$$$ | Medium | gemma3:4b | Low |
| Kingdom health synthesizer | Visibility | High | gemma3:4b | Low |

**Key finding from auditing pulse.sh code:** Guard 2 (lines 477-516) uses SHA256 hashing of TICK_STATE strings. Misses semantic loops where Aeris rephrases the same blocked action differently each tick. `gemma3:4b` semantic similarity check on consecutive TICK_STATEs closes this gap. Highest-leverage AI reasoning addition to existing pulse.sh.

---

*Drone: DRONE-ARCHITECT | Powered by Claude Sonnet 4.6 | Research only — no files written*
← Back to Kingdom Wiki