mr-transparent-routing

Model Router — Transparent Routing

When transparent routing is enabled, swap the model that runs each subagent — keep the agent’s identity, prompt, and tools intact. This is what preserves engine-kit specialization (Unity DOTS, Cocos, etc.): the resolved Unity agent still runs, just on a cheap LLM instead of Opus.

Activation Check

Read .claude/t1k-config-mr.json. The hook + rule both only fire when:

File exists AND modelRouter.enabled is true AND modelRouter.mode is "transparent"

If any condition is false, this rule is inert.

How It Works

Two mechanisms cooperate:

1. `mr-task-interceptor.cjs` (PreToolUse on `Task`)

When the main session spawns a subagent via the Task tool, this hook:

Reads the resolved agent’s .md from the priority chain (project .claude/agents/ → ~/.claude/agents/).
Parses its model: frontmatter field (e.g. claude-opus-4-7, inherit).
Looks up modelRouter.modelMapping[<model>] in t1k-config-mr.json.
If matched AND the agent is NOT in modelRouter.excludeAgents:
- Synchronously runs mr-delegate.sh <agent> <prompt> --provider X --model Y.
- Blocks the original Task (permissionDecision: deny) and returns the cheap delegation’s stdout as a systemMessage. The parent session sees the result as if Task had completed.
Otherwise: silent passthrough — Task runs normally on Anthropic.

This is the primary mechanism. It is mechanical and deterministic — runs whether or not you read this rule.

2. Delegation Bias (this rule, behavioral)

For main-session work that doesn’t go through Task (inline Edit / Write / Bash), you decide whether to delegate. The interceptor only catches Task spawns; mainline edits stay on Opus unless you redirect them yourself.

Step 0 — MANDATORY pre-tool consultation (mainline only)

Before ANY inline Edit / Write / Bash that mutates user files, evaluate Delegation Bias. The Task interceptor handles delegated work for you. This step is for the rest.

The check is one sentence: “Is this task mechanical, boilerplate, or single-purpose?”

Yes → spawn it via Task with the appropriate subagent_type (e.g. t1k-fullstack-developer for implementation, t1k-code-reviewer for review). The interceptor will route to a cheap model automatically. If no subagent_type fits, call mr-delegate.sh directly with a sensible agent name.
No → proceed with inline Edit/Write/Bash. State briefly why (e.g. “multi-file refactor needing cross-file context, inlining”).

Skipping this step and going straight to inline Read+Edit burns Opus tokens for work a cheap model could do. The session-start hook mr-transparent-routing-reminder.cjs reinforces this — if you see [t1k:transparent-routing] ACTIVE, Step 0 is required.

Step 0.5 — Parallel sub-agent dispatch (Agent tool, TeamCreate)

When fanning out parallel sub-agents for mechanical code work, pick the narrowest specialized T1K agent that fits the task. The transparent-routing interceptor will route the agent to a cheap provider automatically via modelMapping — no need to name a v1 cheap-coder shim.

Task pattern	Preferred subagent_type
Mass rename, mechanical refactor, edit-per-plan	`t1k-fullstack-developer`
Read-only exploration (“find/list/grep”)	`Explore` (built-in)
Run tests + report results	`t1k-tester`
Code review (read-only with Bash for lint/grep)	`t1k-code-reviewer`
Doc audit (read-only) or doc writes per spec	`t1k-docs-manager`
Multi-server MCP tool invocation	`t1k-mcp-manager`

general-purpose is the FALLBACK when no specialized T1K agent matches. Default bias: pick the narrowest specialist that fits, not the broadest generalist.

Delegation Bias — Prefer delegation for mechanical work

The primary motivation is Opus token preservation. Cheap subagents cost roughly 1-5% of Opus per token.

Task pattern	Default
Single-file rename, format, lint-fix, add boilerplate	Delegate (Task → implementer-type agent)
Run a test suite + report results	Delegate (Task → tester)
Update README / docstring / comment	Delegate (Task → docs-writer)
Code review of changed lines (single PR / small scope)	Delegate (Task → reviewer)
Find files matching a pattern, list usages, search refs	Delegate (Task → explorer)
Audit existing docs for gaps	Delegate (Task → docs-scout / reader)
Multi-file refactor with cross-file reasoning	Inline (Opus owns this)
Design decision, architecture, planning	Inline (judgment calls)
Task that needs 3+ different tool types or chained context	Inline (orchestration overhead > delegation cost)
Reading one file to gather context (no edit follows)	Inline (single Read is free)

Heuristic — apply BEFORE picking a tool: ask “is this task mechanical, boilerplate, or single-purpose?” If yes → spawn via Task. If it needs design judgement, cross-file reasoning, or 3+ distinct tools → inline. When in doubt for write/mutate tasks → delegate.

Anti-pattern: “the task is too trivial to spawn a subagent for.” That phrase is wrong when transparent routing is on. The Task interceptor does the heavy lifting — your job is just to USE Task for mechanical work instead of inlining.

When NOT to Delegate

Parallel/multi-agent mode: skill invoked with --parallel flag or multi-agent pipeline.
Orchestration tasks: planner, git-manager, brainstormer, project-manager — usually need Opus reasoning; mark them in excludeAgents if you want the interceptor to skip them.
MR_SPAWNED=1: already inside a delegated session (interceptor self-skips, but inline edits should also skip).
User explicitly requested Claude: user said “use Claude” or “don’t delegate”.

`modelMapping` — the configuration knob

Keyed by model name (the value of model: in any agent’s frontmatter), maps to { provider, model }:

{
  "modelRouter": {
    "enabled": true,
    "mode": "transparent",
    "modelMapping": {
      "claude-sonnet-4-6":         { "provider": "kimi", "model": "kimi-k2.7-code" },
      "claude-haiku-4-5-20251001": { "provider": "kimi", "model": "kimi-k2.5" }
    },
    "excludeAgents": [
      "t1k-architect",
      "t1k-planner"
    ]
  }
}

An agent with model: claude-sonnet-4-6 → routes to kimi/kimi-k2.7-code.
An agent with model: claude-haiku-4-5-20251001 → routes to kimi/kimi-k2.5.
An agent listed in excludeAgents → never intercepted; stays on its declared model.

To change all Sonnet calls to a different model: one line in the mapping. Applies to every agent in every kit whose frontmatter declares model: claude-sonnet-4-6.

Opus is never routed (kit policy)

modelMapping deliberately ships no opus rows. The interceptor enforces a hardcoded passthrough set (KIT_PASSTHROUGH_MODELS in mr-task-interceptor.cjs): any agent whose model: frontmatter is opus, claude-opus-4-7, or claude-opus-4-7[1m] is passed through to Anthropic before modelMapping is even consulted. An agent author writing model: opus is asserting “I need Opus quality” — the router treats that as a quality decision, not a cost line, and honors it. Adding an opus row to modelMapping will not override this; the passthrough guard wins. (See #84.) To route opus anyway, you’d have to edit KIT_PASSTHROUGH_MODELS in the kit source — intentionally a code change, not a config knob.

model: inherit (the common case for agents that don’t declare a model) is NOT in the passthrough set — it routes per its modelMapping entry if present, else falls through to capability-based selection.

`defaultBuiltInModel` — routing built-in agents

Built-in Claude Code agents (general-purpose, Explore) ship without a .md file, so the interceptor cannot read their model: frontmatter. The kit-shipped config sets modelRouter.defaultBuiltInModel: "sonnet" by default, so these agents route through the same pipeline as file-based agents (the "sonnet" shorthand resolves to claude-sonnet-4-6 → the shipped modelMapping entry). To override per-consumer, change the model key or remove it to restore Anthropic passthrough:

{
  "modelRouter": {
    "defaultBuiltInModel": "sonnet"
  }
}

The interceptor treats the built-in agent as if its frontmatter were model: <defaultBuiltInModel>, so it resolves through modelMapping or capability-based selection like any other agent. Removing the key restores the legacy behavior where built-in agents passthrough to Anthropic native.

`security` — data-class gate + provider allowlist (#158)

Transparent routing terminates the client TLS and re-originates a separate upstream connection to a third-party provider (kimi=Moonshot, glm=Zhipu, opencode-go), so that intermediary gets full plaintext access to the prompt + code context. Two gates keep sensitive content and untrusted providers out of that path (OWASP ASI04/ASI07).

{
  "modelRouter": {
    "security": {
      "allowedProviders": ["kimi", "opencode-go", "codex"],
      "dataClassification": {
        "enabled": true,
        "blockClasses": ["private-key", "aws-key", "github-token", "api-key", "credential", "pii"]
      }
    }
  }
}

Data-classification gate (P0)

Before routing, the interceptor classifies the prompt (mr-data-classifier.cjs). If a blocked class is detected, it does not route — it passes through so the task runs on Anthropic native.

enabled (default true) — set false to disable classification.
blockClasses — which detected classes force a passthrough. Absent/empty ⇒ block on any detected sensitive class (secure default). Known classes: private-key, aws-key, github-token, api-key, credential, pii.
Precision-first: patterns match specific token shapes (AKIA…, ghp_…, sk-…, PEM keys, JWTs, quoted credential literals, SSN/credit-card) — a false positive only costs a passthrough (safe, no savings), while a false negative would leak a secret. Broad heuristics that would block ordinary source code are deliberately avoided. Inspect classes with node .claude/scripts/mr-data-classifier.cjs --classes.

Provider allowlist (P0)

Only route to providers explicitly vetted in allowedProviders. Every non-Anthropic hop is treated as an untrusted subservice.

Present and non-empty ⇒ a pick (or failover hop) for any other provider is rejected → passthrough to Anthropic. The shipped config declares the allowlist, so new installs are locked down.
Absent/empty ⇒ allow-all (legacy configs that predate this gate keep working).
Enforced at both ends: the interceptor allowlists the primary pick; mr-delegate.sh enforces the same allowlist on every failover.pipe hop (a chain is only as safe as its weakest hop).

Both gates write an audit breadcrumb to ~/.model-router/debug.jsonl (decision: pass-data-class-blocked / pass-provider-not-allowlisted, and dataClass on routed decisions) — inspect with bash .claude/scripts/mr-tail-debug.sh. The classifier logs only the matcher labels, never the secret value.

`perSpawnModelOverride` — explicit per-call model (#153)

By default the delegation model comes from the resolved agent’s model: frontmatter. A caller can override it per spawn via either channel:

tool_input.model — the Task/Agent tool’s per-call model param (tier aliases sonnet/haiku resolve via modelMapping).
an mr-model: directive in the prompt — carries a concrete target even when Claude Code won’t forward a free-form tool_input.model string. Forms: mr-model: kimi/kimi-k2.7-code, [mr-model: glm-5.2], mr-model: sonnet. tool_input.model takes precedence when both are present.

Resolution: provider/model (or provider:model) is used directly; a bare model name resolves to its first enabled provider in providers-config.json; a tier alias maps through modelMapping. An unresolvable or disabled target is ignored (normal selection resumes). The resolved provider is still subject to the security.allowedProviders gate.

Policy — the override does NOT pierce the opus floor. It is applied after KIT_PASSTHROUGH_MODELS / write-agent passthrough, so an opus-declared (or write-capable) agent is floored to Anthropic before the override is even consulted — opus resolves to null. Routing an opus specialist to a cheap model remains a code-change-only decision (kit policy #84), not a per-spawn runtime knob. Disable the whole feature with modelRouter.perSpawnModelOverride: false.

`failover.pipe` — what runs after the primary fails

The primary (selected by modelMapping above) is the head of an ordered failover pipe. When the primary returns a provider-failure signal (HTTP 429, 5xx, ECONNREFUSED, timeout, rate limit text), the bash delegate advances to the next hop without escalating to Anthropic. Anthropic is only used as the terminal when all configured hops fail.

{
  "modelRouter": {
    "failover": {
      "enabled": true,
      "perHopTimeoutSec": 120,
      "pipe": [
        { "provider": "kimi",        "model": "kimi-k2.7-code" },
        { "provider": "opencode-go", "model": "glm-5.2" }
      ],
      "fallbackToAnthropic": true
    }
  }
}

perHopTimeoutSec (default 120) — per-attempt budget. Tuned for kimi at 138K input tokens (~67s observed). The interceptor’s outer spawnSync budget is automatically sized to pipe.length × perHopTimeoutSec + 30s.
pipe (ordered) — hops to try after the interceptor-selected primary. The CLI primary always runs as hop 0; matching entries in the pipe are skipped to avoid a wasted retry. When pipe is absent, the legacy circular failover.chain map is consulted for backward compat.
fallbackToAnthropic: true — after the whole pipe fails, bash exits 42 and the interceptor passes the original Task through to Anthropic native.

A non-provider failure (real model error, not infrastructure) stops the pipe — we propagate the error rather than mask it by burning another 120s on the next hop.

`failover.circuitBreaker` — skip a hard-down provider without re-probing (#157)

The per-hop liveness probe (_completion_probe) is memoized only within a single delegation. Across separate delegations, a provider that has been down for hours is re-probed on every call — burning up to MR_PROBE_TIMEOUT_S (30s) each time before the hop advances. The circuit-breaker persists per-provider failure state to ~/.model-router/circuit-breaker.json so a tripped provider is skipped instantly, without a probe, until a cooldown elapses.

{
  "modelRouter": {
    "failover": {
      "circuitBreaker": {
        "enabled": true,
        "failureThreshold": 3,
        "cooldownSec": 300
      }
    }
  }
}

enabled (default true) — set false to restore the always-probe behavior. The bash delegate only consults the breaker when this is on; off means zero extra subprocess.
failureThreshold (default 3) — consecutive provider-level failures before the breaker trips open for that provider.
cooldownSec (default 300) — how long a tripped provider stays skipped. After it elapses the breaker goes half-open and admits one trial hop: success → closed (fully re-admitted), failure → open again (cooldown re-armed).

State machine: closed (allow, count failures) → open (skip instantly) → half-open (one trial) → closed/open. Only genuine provider failures (5xx / 429 / connection-refused / rate-limit / failed liveness probe) count; a real model/budget error does not trip the breaker. Env overrides for tests: MR_CB_ENABLED, MR_CB_FAILURE_THRESHOLD, MR_CB_COOLDOWN_SEC, MR_CB_STATE_FILE. Inspect live state with node .claude/scripts/mr-circuit-breaker.cjs status.

`failover.inHopRetry` — retry a transient 429 before failing over (#157 P1)

A momentary rate-limit against the cheap primary would otherwise advance the pipe and lose the cheap hop. This retries the same hop after a jittered backoff before failing over, so a transient 429 recovers in place.

{
  "modelRouter": {
    "failover": {
      "inHopRetry": { "enabled": true, "maxRetries": 1, "baseDelayMs": 1500 }
    }
  }
}

enabled (default true), maxRetries (default 1), baseDelayMs (default 1500) — the sleep is baseDelayMs + rand(0..baseDelayMs) (full-jitter).
Only transient rate-limits retry (429 / rate limit / too many requests / quota exceeded). A hard-down signal (5xx / connection-refused / service-unavailable) is not retried — it advances to the next hop immediately, and a budget/client error never retries. This is the “degraded → retry-in-place” vs “down → fail over” distinction that complements the circuit-breaker.
Retries are counted in the per-call trace (retries field in ~/.model-router/traces.jsonl). Env overrides for tests: MR_INHOP_RETRY_ENABLED, MR_INHOP_RETRY_MAX, MR_INHOP_RETRY_BASE_MS.

Delegation Output

The Task interceptor returns the cheap model’s text output via systemMessage. The parent session sees it as the Task’s result.

If mr-delegate.sh exits non-zero (timeout, provider down), the interceptor surfaces what it has + the exit code. Don’t let the Task fall through to Anthropic on error — that would burn Opus tokens AND the cheap call’s tokens. If the delegation failed, decide whether to retry or inline.