# Traverse capture/replay harness Realistic, repeatable benchmark + regression infrastructure for the traversal machinery in `src/traverse.ts`. The synthetic `*.bench.ts` micro-benches don't reflect production traversal load; this harness captures real workloads from pattern/integration test runs and replays them deterministically. ## Pieces | File | Role | | -------------------------------- | ---------------------------------------------------------- | | `../../src/traverse-recorder.ts` | Env-gated capture hooks (fixture format lives here too) | | `replay.ts` | Replays a fixture; extracts the oracle and counter metrics | | `goldens.ts` | Golden storage + human-oriented oracle diffing | | `regen-goldens.ts` | Regenerates goldens (deliberate semantic changes only) | | `fixtures/*.json.gz` | Captured workloads (corpus + invocation trace) | | `goldens/*.golden.json.gz` | Baseline oracles, asserted by `../traverse-replay.test.ts` | | `../traverse-replay.bench.ts` | `deno bench` over the fixtures (CI: `test/*.bench.ts`) | ## Capturing a fixture Capture works in any in-process Deno run of the runtime (pattern tests, runner tests, integration tests, server query code): ```bash CF_TRAVERSE_CAPTURE=/tmp/my-fixture.json \ deno task cf test packages/patterns/notes/notebook.test.tsx gzip -9 /tmp/my-fixture.json mv /tmp/my-fixture.json.gz packages/runner/test/traverse-replay/fixtures/my-fixture.json.gz cd packages/runner && deno run --allow-read --allow-write test/traverse-replay/regen-goldens.ts ``` The recorder logs every `SchemaObjectTraverser.traverse()` call (address, selector, link, `includeMeta`, shared context/memo identity) and snapshots each doc read during traversal into the corpus. `CF_TRAVERSE_CAPTURE_MAX` caps recorded invocations (default 20k). Edit `meta` in the fixture JSON to give it a name/description before gzipping. Fidelity caveats (fine for benchmarking/regression): docs are captured first-wins, so mid-run writes replay with the earliest value; client invocations replay with `StandardObjectCreator`, so cell/proxy construction cost is excluded while traversal control flow is preserved. ## The oracle `replayFixture(fixture, { collectOracle: true })` extracts three things any behavior-preserving optimization must keep byte-identical: 1. **Result hashes** per invocation (truncated structural hashes). 2. **The read set** — every `tx.read`/`readOrThrow` address + option flags. This is the scheduler's invalidation surface: dropping a read mark breaks reactivity invisibly, and no unit test catches it. (Verified: commenting out a single `READ_FOR_SCHEDULING` read in `traverseDAG` fails the test with "reads missing".) 3. **Schema-tracker contents** for shared/`includeMeta` contexts — the server-side subscription surface. `deno task test` runs `traverse-replay.test.ts`, which asserts replay matches the goldens. An intended semantic change regenerates goldens via `regen-goldens.ts`; the golden diff in the PR is the review artifact. ## Benchmarks ```bash # CI shape (notebook sliced to its first 500 invocations): deno bench --allow-read --allow-env --no-check test/traverse-replay.bench.ts # Full replay (the optimization-loop metric) + counter attribution: CF_REPLAY_BENCH_FULL=1 BENCH_DIAGNOSTICS=1 \ deno bench --allow-read --allow-env --no-check test/traverse-replay.bench.ts ``` Counters (schema calls, anyOf branches/fast-rejects, getDocAtPath, memo hits) are deterministic where wall time is noisy: claimed wall-time wins should come with a counter explanation. ## Current fixtures - `notebook-test` — notebook pattern test; 2.4k invocations, 792 docs, 318 distinct selectors, anyOf-heavy vnode load (~82k anyOf branch evaluations). The main client-shaped metric (~3s full replay). Some recorded invocations fail validation (INVALID_TYPE etc.) — real reactive-read load includes heavy fast-fail traffic. - `shopping-list-test` — small array/handler-heavy client load; 474 invocations, 77 docs, fast inner-loop fixture (~20ms). - `piece-query-legacy` — a captured server query dataset (36 docs); server-shaped (`includeMeta`, single big traversal). Converted from the old `integration/traverse_timing.test.ts` dataset. The fixture keeps the original `selectedPiece` selector and keys the corpus by the capture space, which links in the docs carry explicitly. The timing test, its JSON, and the one-off converter have been removed; this fixture is the canonical copy of the dataset.