# Scope: trusted-agent tool-input integrity (llmDialog / generateObject) **Status:** Scoping (not implemented). Tracks the "trusted agent" work split out of the agent prompt-injection demo investigation. The demo *loads* (it was made SES-loadable by #4222, which moved the CFC authoring vocabulary into the `commonfabric/cfc` host module, and is guarded by `cf check` per #4153), but its central security invariant is not enforced — verified still-broken on current `main` by the `it.ignore`'d `packages/runner/test/cfc-agent-tool-input-integrity.test.ts`. ## Problem The spec's defense against prompt injection is **structural**: a routing/control field on a sink tool (e.g. `sendMail.recipient`) declares `ifc.requiredIntegrity: [Builtin(agent-kernel), UserSurfaceInput, PromptSlotBound]`, and the runtime refuses the tool call unless the value passed to that field carries that integrity. Bytes the model copied out of a hostile briefing carry none of it (spec §13.6; `~/src/specs/cfc/demos/01-agent-prompt-injection.md`; `components-llm.md`: *"LLM output is bytes the user did not type … it cannot honestly carry `AdmittedHere` integrity"*). Today that invariant is **not enforced**. Reproduced (runner, mock mode, `enforce-explicit`): a `sendMail` tool call with `recipient: "bob@evil.org"` **is sent**. Three gaps compose: 1. **Tool inputs are never checked against their own `requiredIntegrity`.** The dialog tool path (`llm-dialog.ts` `handleInvoke` / `executeToolCalls`) uses a tool's `inputSchema` only to validate/strip input *structure*. It never consults `ifc.requiredIntegrity` on the model-supplied input. 2. **The enforced gate runs on the wrong surface.** `verifyInputRequirements` (in `prepareBoundaryCommit`) checks `requiredIntegrity` on **writes** against the write target's schema. The demo's handler writes `recipient` into `emails`/`result`, whose schemas carry no `requiredIntegrity`, so the gate never sees it. 3. **Vacuous pass on unlabeled literals.** Even if a write *did* target a `requiredIntegrity` schema, the gate quantifies over the transaction's *consumed reads*; a plain model-output literal has no provenance, so there is nothing to fail (audit #14 — sound enforcement needs per-write data-flow provenance, the deferred end-state in `docs/plans/runner_cfc_implementation.md`). The confidentiality axis is in good shape (observation ceiling via `effectiveObservationCeiling = meet(pattern, deployment)` #4055; sink-request egress ceiling #4070). This scope is the **integrity axis**. ## Proposed enforcement (for review) Three pieces, each independently landable; (A) is the minimum that makes the demo's invariant hold for the realistic attack. ### A. Enforce tool-input `requiredIntegrity` at invoke time Before a tool handler runs, validate each model-supplied input field against its `inputSchema`'s `ifc.requiredIntegrity`. The value the model put in the field is untrusted (it is model output — see B), so a field requiring `[UserSurfaceInput, PromptSlotBound, …]` can only be satisfied by an input the model passed **by reference** to an integrity-bearing cell (an opaque handle / `{ "@link": … }`), never by a copied string literal. A field whose value is a bare string fails closed when it declares `requiredIntegrity`. - Seam: `handleInvoke` in `llm-dialog.ts` (and the `llm.ts` tool path), where `toolCall.input` is cellified before the handler is called. - Reuse the existing integrity-membership logic from `prepareBoundaryCommit` (`verifyInputRequirements`) rather than a parallel implementation, so the invoke-time check and the commit-time gate can't drift. - Failure → an error tool-result message (the loop continues; the model is told the call was refused), matching how `toolAllowsObservedConfidentiality` already denies over-confidential tool calls. ### B. Stamp model output and tool results as untrusted (`LlmDerived`) The builtins do not label model output today, so "untrusted" is merely *absence* of integrity — fragile, and the root of the vacuous pass. Stamp model-produced bytes (assistant content, tool-result messages written back) with an `LlmDerived` integrity atom at the point they enter the store (`createToolResultMessages` / the message append in `llm-dialog.ts`). Then a `requiredIntegrity` gate fails closed *positively* on model-derived values, and the value's provenance is explicit for downstream flow. (`components-llm.md` already defines the `LlmDerived` / `DerivedFromAdmitted` atom shape.) ### C. Close the vacuous pass for control fields For a field carrying `requiredIntegrity`, an *absent* provenance must not be a pass. Minimal version scoped to this surface: the invoke-time check in (A) treats a model-supplied scalar as carrying `LlmDerived` (from B) — which by construction lacks the required atoms — so it fails closed without needing the full per-write data-flow provenance work. The general fix (per-write provenance) stays the deferred end-state. ## Open decisions (architect) 1. **Refusal surface.** Error tool-result (loop continues, model can retry with a by-reference value) vs. hard commit rejection. Recommend error tool-result for agent ergonomics, with the sink write still gated at commit as defense in depth. 2. **By-reference contract.** How does a legit recipient reach `sendMail` with integrity intact? The direct-command value must be passed as an opaque handle/link the kernel resolves, not re-emitted as text by the model. Does the demo already model this, or does it need a "bind direct-command field" affordance? 3. **Scope of enforcement.** Only fields declaring `requiredIntegrity` (opt-in, demo works) vs. all control fields by default. Recommend opt-in first. 4. **Interaction with the deferred per-write provenance (#14).** (C) is a local shortcut; confirm it's acceptable as an interim or whether to wait for the general mechanism. ## Test plan - Un-ignore `packages/runner/test/cfc-agent-tool-input-integrity.test.ts` (injected recipient refused) — the red→green guard for (A). - Add: a by-reference recipient carrying the required integrity is **allowed** (proves (A) doesn't over-block the legitimate path). - Add: a confidential tool result is stamped `LlmDerived` and cannot satisfy a later `requiredIntegrity` field (guards (B)). - End-to-end: drive the demo's two agents via the mock; the unsafe agent's injected `sendMail` is refused, the safe agent's direct-command `sendMail` succeeds. ## Risk / blast radius - Invoke-time enforcement only fires for tools whose `inputSchema` declares `requiredIntegrity` — no behavior change for tools that don't (opt-in). - `LlmDerived` stamping touches every tool result; gate it behind enforcement mode and verify it doesn't perturb existing llm-dialog tests (several assert tool results flow back unchanged).