# LLM Testing ## Overview LLM-powered patterns have test coverage at three layers: 1. **Client guard** — `LLMClient` blocks live LLM calls in test environments 2. **Server-side tests** — toolshed route logic (model resolution, JSON mode, tool conversion) 3. **Runner smoke tests** — full pattern-to-mock-response path ## Test-environment guard `packages/llm/src/client.ts` includes a test-environment check (evaluated once at module load) which detects: - `CI=true` (CI runners) - `ENV=test` (set by `deno task test` in llm, runner, and toolshed packages) The guard throws before any `fetch` call when running in a test environment without mock mode enabled: ``` LLMClient: live LLM calls are blocked in test environments. Use enableMockMode() and addMockResponse() to set up mocks. ``` When mock mode is enabled via `enableMockMode()`, the mock interception runs first and the guard is never reached. ## Writing tests that use LLM ```ts // Shown at module scope. import { enableMockMode, addMockResponse, addMockObjectResponse, clearMockResponses, resetMockMode, } from "@commonfabric/llm/client"; // Enable once at module level enableMockMode(); // In beforeEach, clear previous mocks beforeEach(() => clearMockResponses()); // Register mock responses with matchers addMockResponse( (req) => req.messages.some(m => typeof m.content === "string" && m.content.includes("hello") ), { role: "assistant", content: "Hi!", id: "mock-1" }, ); // For generateObject (no tools path) addMockObjectResponse( (req) => req.schema.type === "object", { object: { name: "Alice" }, id: "mock-2" }, ); ``` Mock responses are **one-time use** — they're consumed when matched. ## Conversation Fixtures For multi-turn or complex LLM interactions, use **conversation fixtures** — declarative JSON files that queue responses sequentially instead of writing inline `addMockResponse()` calls. ### Fixture format ```jsonc // packages/runner/test/fixtures/my-conversation.json { "description": "Two-turn chat with tool call", "responses": [ { "type": "sendRequest", "expectRequest": { "messagesContain": ["hi"], "messageCount": 1 }, "response": { "role": "assistant", "content": "Hello!", "id": "turn-1" } }, { "type": "sendRequest", "response": { "role": "assistant", "content": [ { "type": "tool-call", "toolCallId": "call_1", "toolName": "lookup", "input": { "query": "weather" } } ], "id": "turn-2-tool" }, "expectRequest": { "hasTools": ["lookup"] } }, { "type": "sendRequest", "response": { "role": "assistant", "content": "It's sunny!", "id": "turn-2-final" } } ] } ``` Supported entry types: `"sendRequest"` and `"generateObject"`. ### Optional assertions Each entry can include an `expectRequest` object to validate the request: | Field | Description | |-------|-------------| | `messageCount` | Request has exactly this many messages | | `messagesContain` | Each string appears in at least one message (strings may match different messages) | | `lastMessageContains` | Last message content contains this string | | `hasTools` | Request includes these tool names | | `systemContains` | System prompt contains this string | ### Loading fixtures in tests ```ts import { clearMockResponses, loadConversationFixture, loadConversationFixtureFile, } from "@commonfabric/llm/client"; // From a file await loadConversationFixtureFile("test/fixtures/my-conversation.json"); // Or inline loadConversationFixture({ responses: [ { type: "sendRequest", response: { role: "assistant", content: "Hi!", id: "1" }, }, ], }); ``` Both functions enable mock mode automatically. Call `clearMockResponses()` in `beforeEach` to reset between tests. ## Test files | File | What it tests | |------|--------------| | `packages/llm/src/client.test.ts` | Guard behavior, mock mode API, fixture loading | | `packages/toolshed/routes/ai/llm/generateText.test.ts` | JSON mode config, response cleaning | | `packages/toolshed/routes/ai/llm/generateObject.test.ts` | Model resolution, error paths | | `packages/runner/test/llm-pattern-smoke.test.ts` | generateText, generateObject, and tool-calling through runtime | | `packages/runner/test/llm-conversation-fixture.test.ts` | Multi-turn conversations and tool chains via fixtures | ## Running tests ```bash # LLM client tests (guard + mock + fixtures) cd packages/llm && deno task test # Toolshed server tests cd packages/toolshed && deno task test # Runner tests (includes smoke tests + fixture tests) cd packages/runner && deno task test ``` ## Related documentation - [TESTING.md](TESTING.md) — running the suites and the general unit and integration test structure. - [COVERAGE.md](COVERAGE.md) — how the runtime coverage these tests produce feeds the coverage-debt gate.