SYS AGENT TOOL PATTERNS

Updated 1 month ago
# SYS_AGENT_TOOL_PATTERNS — Kingdom Agent Tool Architecture

```
⛬ KID:CORE_LORE:SYSTEMS:SYS_AGENT_TOOL_PATTERNS|1.0:◉:2026-03-10:BRANDON+CLAUDE ⛬
```

**Authorship:** Brandon McCormick (vision) + Claude (architecture) — Session 175
**Status:** LIVE ◉ — Patterns confirmed GA with Sonnet 4.6 API
**Source:** Anthropic beta → GA (Nov 2025 → Mar 2026). Validated externally by "The AI Automators" implementation with Qwen 3.5. Model-agnostic.

---

## The Problem

Tool calling breaks down at scale. Two failure modes:

**Failure Mode 1 — Context Bloat (too many tools)**
Loading all tool schemas upfront eats context before the first message.
60 tools = ~13K tokens spent before the conversation starts.
As agents gain capabilities, this compounds.

**Failure Mode 2 — Token Burn (sequential LLM rounds)**
Asking the LLM to make 56 individual tool calls to aggregate data from N records
= 76K tokens, slow, expensive, and often inaccurate (misses records).
The LLM is doing what a for-loop should do.

---

## Pattern 1: Deferred Tool Loading (Tool Search)

**Core idea:** Don't load all tool schemas upfront. Expose ONE `tool_search` meta-tool.
The agent queries it to discover and load specific tools on demand.

### How It Works

```
BEFORE (all tools loaded upfront):
  Context init → 60 tool schemas injected → 13,000 tokens used → first message

AFTER (deferred loading):
  Context init → 1 tool_search schema injected → 6,300 tokens → first message
  Agent calls tool_search("list commits") → loads list_commits schema → executes
  Next call reuses already-loaded schema (no re-search needed)
```

### Token Impact

Demonstrated (60-tool agent, 2 MCP servers):
- Before: 13,000 tokens / context init
- After: 6,300 tokens / context init
- **~51% reduction** before any work is done

Anthropic reported 85% reduction in their own benchmark (larger tool sets).

### When to Apply

Use deferred loading when an agent has **more than ~15 tools** available.
Below that threshold, overhead doesn't justify the pattern.

| Tool count | Decision |
|------------|----------|
| < 15 | Load all upfront. Simpler. |
| 15-30 | Consider deferred for expensive tools (large schemas) |
| 30+ | Deferred loading is mandatory |

### Kingdom Applications

**Already live:**
- Claude Code sessions use ToolSearch natively. All `mcp__tavily__*`, `mcp__opendia__*`, and most deferred MCP tools load on demand via the `ToolSearch` tool. This is why the deferred tools list exists at session start — they're unloaded until searched.

**Must be designed in — Herald:**
When THE_HERALD is built, it will expose: Bluesky/Instagram/TikTok/YouTube/LinkedIn posting tools + monitoring tools + comment aggregation. That's 20-40 tools. Deferred loading is non-negotiable — don't load all platform schemas for every request.

**Consider for — future Phonebooth v2:**
If Phonebooth concierge gains tool capabilities (web search, calendar, memory lookup), deferred loading should be the default architecture.

### Implementation Note

The `tool_search` tool does not need to be a special API feature — it's a design pattern:
```python
# The meta-tool
def tool_search(query: str) -> list[ToolSchema]:
    """Search the tool registry. Returns matching tool schemas for loading."""
    return registry.search(query)  # semantic or keyword search against tool descriptions

# Tool schemas start "unloaded" — only the search tool is in context
# When agent calls tool_search, returned schemas get injected into next context turn
# Subsequent calls to the discovered tool proceed normally
```

This is model-agnostic. Claude, Qwen, GPT — all benefit equally.

---

## Pattern 2: Programmatic Tool Calling

**Core idea:** When a task requires N sequential operations over a dataset, don't make N LLM-driven tool calls. Write code that does the iteration, then execute it once.

### How It Works

```
BEFORE (sequential LLM rounds):
  LLM call 1 → get_team_members() → LLM processes result
  LLM call 2 → get_expenses(member_1) → LLM processes
  LLM call 3 → get_expenses(member_2) → LLM processes
  ... × N members
  = 56 tool calls, 76K tokens, misses edge cases

AFTER (programmatic):
  LLM writes code:
    members = get_team_members()
    budgets = get_budget_by_level()
    results = [get_expenses(m) for m in members]
    overages = [(m, r) for m, r in zip(members, results) if r.total > budgets[m.level]]
  Execute in sandbox → accurate result
  = 1 code execution, fraction of tokens, complete answer
```

### Realistic Expectation

The agent rarely one-shots the code. It iterates:
1. Writes code
2. Executes → sees error (unknown output structure, type mismatch)
3. Iterates on its own code
4. Gets correct result

This is normal. It's Claude Code's loop pattern applied to tool calls.
The iteration still costs far fewer tokens than 56 sequential LLM rounds.

### When to Apply

Use programmatic tool calling when:
- The same tool is called **N times** with different parameters (N > 5)
- The task is **data aggregation** (sum, filter, cross-reference across records)
- The operation structure is **known upfront** (it's a loop, not a branching decision tree)

Do NOT use when:
- Each tool call requires LLM judgment to determine the next call
- The data is small (< 5 records)
- Side effects require human review between steps (e.g., posting to social platforms)

### Kingdom Applications

**Herald — batch processing:**
Fetching engagement data across 5 platforms, calculating optimal post timing,
cross-referencing comment sentiment — all of this is programmatic, not sequential LLM calls.
Write the aggregation code, execute once, hand result to tonal router.

**Phonebooth — if analytics are added:**
"Which conversation topics generate the most follow-up questions?" →
Don't make 100 LLM calls over 100 booth conversations.
Write a script that processes booth history → aggregate → single LLM synthesis call.

**CRYBABY runner — already correct:**
The runner shell script IS programmatic tool calling. It iterates over CRYBABYs,
runs each check.sh, aggregates results, dispatches one alert batch.
This is the pattern working correctly.

### Sandbox Requirement

Programmatic tool calling requires a code execution sandbox:
- **Development:** Local subprocess (acceptable for Kingdom builds)
- **Production:** Docker container with resource limits, no network egress
- **Kingdom standard:** Use existing Python venvs per project for isolation

---

## Pattern Interaction: Tool Search + Programmatic Together

The patterns compose:

```
1. Agent receives task: "Analyze all booth conversations for engagement patterns"
2. tool_search("conversation history") → loads get_booth_history tool
3. tool_search("analytics") → loads analyze_sentiment tool
4. Agent writes programmatic code using those two tools
5. Executes: loops over all booths, aggregates results
6. Returns synthesis
```

Result: Context loaded efficiently (step 2-3), iteration done in code (step 4-5),
LLM only used for what LLMs are good at — final synthesis.

---

## Kingdom Principle: LLMs Do Semantics, Loops Do Math

This maps to CRYBABY's core philosophy:
> "Scripts do math. LLMs do semantics."

Extended to tool calling:
- **LLM decides WHAT to do** — which tools, which approach, final interpretation
- **Code executes HOW** — iteration, aggregation, repeated API calls

An LLM making 56 sequential tool calls is doing a for-loop.
Give it a sandbox and let it write one.

---

## Anti-Patterns

| Anti-pattern | Problem | Fix |
|--------------|---------|-----|
| Load all MCP tools upfront always | Context eaten before first message | Deferred loading for >15 tools |
| LLM decides each record individually | 76K tokens, misses records | Programmatic aggregation |
| Single LLM call for N-record data | Hallucinates missing records | Write code, execute, synthesize |
| No sandbox for programmatic calls | Security risk | Python venv isolation minimum |
| Deferred loading for small tool sets | Unnecessary complexity | < 15 tools = just load them |

---

## Build Checklist (for Agent Design Sessions)

When designing a new Kingdom agent (Herald, concierge, any MCP-heavy system):

- [ ] Count total available tools. If > 15, design for deferred loading from the start.
- [ ] Identify any task that repeats the same tool call N times. That's programmatic territory.
- [ ] Confirm sandbox strategy for code execution (venv, Docker, subprocess).
- [ ] Design `tool_search` registry if building custom agent framework.
- [ ] Keep LLM calls for: deciding approach, final synthesis, judgment calls.
- [ ] Keep code for: loops, aggregation, repeated calls, data transformation.

---

## References

- **Anthropic beta → GA:** Nov 2025 (beta) → Mar 2026 (GA with Sonnet 4.6 API)
- **Token reduction:** 85% (Anthropic internal benchmark, large tool sets)
- **External validation:** "Anthropic Just Changed How Agents Call Tools" — The AI Automators (Mar 7, 2026) — implemented on Qwen 3.5 27B, confirming model-agnostic applicability
- **Related:** `SYS_SKILLS.md` (Kingdom skill deferred loading), `SYS_MCP_AGENT_MAIL.md` (tool counts per agent)
- **CRYBABY parallel:** `SYS_CRYBABY.md` — "Scripts do math. LLMs do semantics." — same principle at the check layer
← Back to Kingdom Wiki