# Default-app note create: profile, hotspots, benchmarks

First end-to-end measurement of a real user flow for the
[Performance Program](../PERFORMANCE_PROGRAM.md): the **default-app shell
integration test** (`packages/patterns/integration/default-app.test.ts`),
which creates notes through the shell UI against a local toolshed with the
runtime in a browser web-worker.

**Scope: steady-state note creation.** Pattern compilation is explicitly out
of scope (note 1 of each run is compile-dominated and excluded everywhere
below). Measured June 2026 on an Apple M3 Max, local stack, emulated user via
Astral/CDP.

## How to reproduce

Start the local stack, then run the test with the capture env vars:

```bash
./scripts/start-local-dev.sh --port-offset=73

cd packages/patterns
API_URL=http://localhost:8073 HEADLESS=1 LOG_LEVEL=warn \
CF_NOTE_CREATE_TIMING_SERIES=5 \
CF_CAPTURE_NOTE_CREATE_PROFILE_SERIES=5 \
CF_CAPTURE_NOTE_CREATE_CPUPROFILE_SERIES=3 \
CF_CPUPROFILE_DIR=/tmp/cf-profiles \
deno test --v8-flags=--max-old-space-size=4096 -A \
  --filter "default-app flow test" ./integration/default-app.test.ts
```

- `CF_NOTE_CREATE_TIMING_SERIES=N` — wall-clock timing for N note creates.
- `CF_CAPTURE_NOTE_CREATE_PROFILE_SERIES=N` — logger timing stats
  (`focusTiming`/`topTiming`/settle history) per note, via
  `commonfabric.rt.getLoggerCounts()` IPC.
- `CF_CAPTURE_NOTE_CREATE_CPUPROFILE_SERIES=N` — **new**: V8 sampling
  profiles of the runtime web-worker for notes 2..N+1, written as
  `.cpuprofile` (loadable in Chrome DevTools / speedscope) plus a ranked
  self-time report. Implemented by `packages/integration/cdp-profiler.ts`
  (`CdpWorkerProfiler`), which attaches a second CDP client to the Astral
  browser (`ShellIntegration.wsEndpoint()`), auto-attaches to the page's
  dedicated workers via flattened sessions, and drives `Profiler.start/stop`
  on the `worker-runtime` target. Note-1 profiles are skipped: they are
  compile-dominated and big enough to break the CDP websocket message limit.

## Wall-clock: beware the test's 500 ms poll quantum

The test reports ~1.15 s per note create (createToView ~520 ms, returnToHome
~630 ms), but `waitFor` in `@commonfabric/integration` polls every **500 ms**,
so each phase is quantized up to the next poll. Subtracting the quantum, the
real runtime cost is roughly **150–300 ms per note** — consistent with the
~200 ms of worker CPU the logger measures, plus IPC and main-thread render.
Don't use the test's wall numbers as an optimization target without fixing
the poll delay (or use the logger/profile numbers below).

## Logger measurements (steady state, per note create)

Per-note deltas are stable from note 2 on (5-note series):

| Key | Count/note | ms/note |
|---|---|---|
| `scheduler/execute` (settle-loop entries) | 14 | ~183 |
| `scheduler/execute/settle` | 14 | ~130 |
| `scheduler/run` (action runs) | 61 | ~115 |
| `scheduler/run/action` (action bodies) | 61 | ~96 |
| `traverse` | **+41/note growth** (362 → 485) | ~70 |
| `scheduler/execute/collectDirtyDependencies` | **+28/note growth** (810 → 894) | ~25 |
| `raw/run/wish` | 5 | ~26 |
| `scheduler/execute/event/handlerAction` | 4 | ~20 |

Two clear **O(existing-note-count) growth seams**: `traverse` call count and
dirty-dependency visits grow linearly with list size on every create, i.e.
quadratic accumulated cost as a space fills up.

Of the wish cost, the `#notebook` hashtag query dominates:
`wish/phase-query/send-shared-hashtag/#notebook` is ~15 ms per call (6 calls
across the run) — almost all of the shared-hashtag resolver's `sendResult`.

## Worker CPU profiles (notes 2–4 aggregated, 1446 ms busy)

Profiles taken at 250 µs sampling; analysis maps bundle frames back to source
files via `worker-runtime.js.map`. "Busy" excludes `(program)`/idle (the
worker idles between test polls).

### By phase (top-level dispatch)

| Phase | Share of busy CPU | What it is |
|---|---|---|
| `handleVDomMount` | **37%** | `WorkerReconciler.mount` — re-mounting the home view's vdom on navigation back; per-child cell subscribe + render |
| `runPullSettleOrder` | 21% | settle-loop action runs (lifts, maps, wish) |
| (other) | 17% | GC (5% of busy) + worker IPC encode/decode + misc |
| `execute` (scheduler) | 11% | dependency collection, scheduling, traverse machinery |
| `runCommitCallbacks` | 11% | post-commit runner starts; ~24% of this phase is the verified-bindings walk (`seedVerifiedLoadIds`, `verifiedWalkChildValues`, `collectAssociatedFunctions`) |
| `handleRequest` | 2% | direct IPC requests |

### By subsystem (cross-cutting, share of busy CPU)

| Subsystem | Share | Hot functions |
|---|---|---|
| Value hashing | **~29%** | `feedPlainObject` 12%, wasm SHA-256 5%, `feedObjectValue` 3%, `internSchema` 2%, hasher/encoding rest |
| Deep-freeze | **~12%** | `deepFreezeInProgress` 8%, `checkValue` 3% |
| Link resolution + schema traverse | ~9% | `resolveLink` 4.3%, `traverseWithSchema` et al. |
| Storage selector/tx | ~8% | `selector-tracker.ts` 3%, `v2-transaction.ts` 2.8%, `cache.get` 2% |
| CFC schema refs | ~7.5% | `resolveCfcSchemaRef` 3.5%, `findCfcSchemaRefs` 2.2%, `schemaAtPathInternal` 1.4% |
| Verified-bindings walk | ~5% | `seedVerifiedLoadIds`, `verifiedWalkChildValues` (CT-1665 machinery) |
| GC | ~5% | allocation pressure from the above walks |

Both hashing (`hashOf`) and deep-freeze (`isDeepFrozen`) cache **by object
identity** (WeakMap/WeakSet). Every fresh-identity but structurally-equal
value — query results, vdom, specs rebuilt per render — pays a full O(tree)
walk. That's why these two top the chart despite their caches.

## Benchmarks (new)

Each hotspot now has a benchmark that reproduces the integration shape
in-process, so it can be optimized without a browser:

1. **`packages/runner/test/default-app-note-create.bench.ts`** — macro bench:
   home doc with linked note docs, lifted derived view, live sink, event
   handler creating/removing a note. Covers event → preflight → handler →
   commit → settle → recompute, including the growth seams. Baseline:

   | Existing notes | create+remove cycle (pull) |
   |---|---|
   | 0 | 14.3 ms |
   | 32 | 24.1 ms |
   | 128 | 59.7 ms |

   Clear O(n): ~0.36 ms per existing note per create.

2. **`packages/html/bench/worker-reconciler-mount.bench.ts`** — the 37%
   phase: mounts a list vdom whose children are real cells (like piece `[UI]`
   links) through `rendererVDOMSchema`. Baseline:

   | Case | time |
   |---|---|
   | mount+unmount @8 children | 21.9 ms |
   | mount+unmount @32 children | 83.9 ms |
   | mount+unmount @128 children | 344.7 ms |
   | re-mount unchanged tree @32 | 171.5 ms (= 2× first mount: nothing is reused) |
   | single-child update under live mount @32 | 6.7 ms |

   ~2.7 ms **per child** to mount; a re-mount of an unchanged tree pays full
   price, and even a one-child update costs milliseconds.

3. **`packages/data-model/bench/value-identity-shapes.bench.ts`** — the
   identity-cache gap behind the hashing/freeze numbers. Baseline:

   | Case | time |
   |---|---|
   | `hashOf` deep-frozen doc, same identity | ~10 ns |
   | `hashOf` fresh-identity note doc (~30 nodes) | ~5 µs |
   | `hashOf` fresh-identity home doc @128 notes | **2.1 ms** |
   | `isDeepFrozen` frozen-but-uncached note doc | 19 µs |
   | `deepFreeze` fresh home doc @128 notes | 402 µs |

Existing related benches: `scheduler-event-preflight.bench.ts` (the 30-note
preflight shape), `push-pull-patterns.bench.ts` (map/filter machinery),
`traverse-replay` harness + `link-resolution.bench.ts`.

## Optimization candidates (ranked by measured impact)

1. **Don't re-mount the home view from scratch on navigation** (37% phase +
   main-thread twin). Keep the reconciler mount alive across view switches,
   or memoize per-child render state keyed by cell identity + version so an
   unchanged child re-mount is O(1). The re-mount bench is the regression
   guard.
2. **Make the single-child update path O(1).** 6.7 ms for one chip update at
   32 children means updates re-read far more than the changed subtree.
3. **Structural (content) caching for `hashOf`/`isDeepFrozen`**, or freeze +
   reuse canonical value graphs so the identity caches actually hit. The
   2.1 ms home-doc hash is paid multiple times per create. Same family as the
   schema canonicalization win in #3948 (8.4× traverse) — values, not schemas.
4. **Bound the O(n)-per-create growth** (traverse +41/note, dirty-visits
   +28/note): incremental list diffing instead of whole-list re-reads in the
   derived home view. Guarded by the macro bench's @0/@32/@128 spread.
5. **Memoize CFC schema-ref resolution** (`resolveCfcSchemaRef` /
   `findCfcSchemaRefs`, 7.5%) — these re-walk schemas that `internSchema`
   already canonicalizes.
6. **Cache the verified-bindings commit walk** (`seedVerifiedLoadIds` /
   `verifiedWalkChildValues`, ~5%, 24% of commit callbacks) per executable +
   value identity (CT-1665 follow-up).
7. **Shared-hashtag wish `sendResult`** (~15 ms per `#notebook` query):
   profile what `sharedWishCellValue` → `sendResult` rewrites each time the
   resolver is already shared.

Follow-up measurement gaps (not covered here): main-thread (page) profile of
the same flow (the reconciler has a DOM-applying twin), storage server time,
and a benchmark for per-note pattern instantiation (`startWithTx` in commit
callbacks; the macro bench uses plain docs, not running patterns).

## Optimization round 1 (June 2026): candidates #3, #5, #2

Landed on `perf/selector-schema-standardization`:

1. **SelectorTracker schema standardization** (candidate #3's top seam,
   ~210 ms of 574 ms attributable hash/freeze time): content-hash LRU beside
   the frozen-identity WeakMap in `getStandardSchema` (mutable schemas pay
   exactly one content hash per call, preserving the edited-in-place contract),
   memoized `$defs`-stripped comparison hashes, hoisted `findRefs`.
   `selector-tracker.bench.ts` (new): warm lookup 466→225 µs, subset path
   1.8→0.8 ms.
2. **CFC schema-ref memoization** (candidate #5): `resolveCfcSchemaRef` +
   `findCfcSchemaRefs` cached per deep-frozen schema identity; resolution
   results are now identity-stable, restoring downstream cache hits.
   Reconciler mount bench: ~20% faster across sizes (@32: 83.9→67.5 ms).
3. **`cfc.schemaAtPath` memo** (candidate #2 groundwork): cached per frozen
   schema identity × path × boolean flags; serves the per-element write-diff
   calls and selector sub-schema derivations.

**End-to-end (worker busy CPU, notes 2–4 of the integration test):
1446 ms → 1182 ms (−18%).** `handleVDomMount` 540→406 ms (−25%);
`feedPlainObject` fell from top frame (92.6 ms in mount) to 20 ms;
deep-freeze and schema-ref frames left the top-12 entirely.

**Candidate #2 status:** measured single-child update is already O(1) in
list size (300 updates: ~1.5 s @8 / ~1.6 s @32 / ~2.0 s @128 children); the
~5 ms/update constant decomposes into commit hashing, `normalizeAndDiff`
write-diff, sink re-subscription (largely server-side graph re-extension in
the emulated bench), and read-back traverse/freeze. No single reconciler-side
lever exists.

**Next levers, in rough order of leverage:**
- **Intern/freeze cell schemas at the `getCell`/`asSchema` seam.** Several
  identity-keyed caches (schemaAtPath, value-hash, schema-refs) are gated on
  `isDeepFrozen` and stay cold because cell schema literals are mutable
  objects. Interning once at the seam would make them hit everywhere.
- `decodeJsonPointer` showed up at 23 ms in the post-optimization mount
  profile — trivially memoizable.
- Keep reconciler mounts alive across navigation (candidate #1, untouched:
  re-mount still pays full price by design of `handleVDomMount`).
- Skip sink re-subscription bookkeeping when the read set is unchanged.
- Value-graph identity reuse (freeze + reuse query results) — the remaining
  big hashing/freeze lever.

## Optimization round 2 (June 2026): remount hashing + poll fix

Branch `perf/reconciler-remount` (stacked on round 1). Two changes from the
follow-up session:

**Test poll quantization fixed.** `waitFor`'s default poll interval dropped
500ms → 50ms (`CF_WAITFOR_DELAY_MS` to override). The timing series now
measures reality: per-note totals reported ~1147ms before, **~409ms** after
(createToView ~294ms, returnToHome ~116ms) on the optimized stack.

**Remount made ~5× cheaper for all complex UIs** (investigated per the
"don't keep mounts alive, make mounting fast" directive). Profiling 100
mount/unmount rounds @32 children showed **58% of busy CPU was content
hashing**: five seams rebuilt fresh-but-structurally-equal schema/selector
objects per mount, and `hashOf`/`internSchema` caches key on object identity
at entry only — a fresh wrapper re-walks the whole embedded vdom schema.
The seams (each now memoized/canonicalized, gated on deep-frozen inputs):

1. `asCellCompoundSchemaForValue` rebuilt + interned every anyOf branch per
   read of every vdom node → candidate list cached per schema identity.
2. Created child cells got a fresh stripped-`asCell` schema per
   materialization (re-hashed at `resolveLink`'s exit intern on every
   `isStream`) → `unwrapAsCellSchema` memoized + interned.
3. `internPathSelector` canonicalized the schema but not the selector
   wrapper → now returns a canonical selector instance per (schema, path).
4. `pull`'s sync dedup key hashed a fresh wrapper embedding the selector
   schema → key composed from per-part cached hashes.
5. `cfc.schemaAtPath`'s cache was per-instance while pull/watch/traversal
   create a fresh `ContextualFlowControl` per call (permanently cold) →
   module-level, results interned. Plus `checkAnyOf`'s per-item comparison
   hashes cached per frozen identity.

Numbers:

| Gauge | Baseline | Round 1 | Round 2 |
|---|---|---|---|
| 100 mount/unmount rounds @32 children (in-process) | 7511ms* | — | **1828ms** |
| Reconciler bench: mount+unmount @128 | 344.7ms | 265.0ms | **69.3ms** |
| Reconciler bench: re-mount unchanged @32 | 171.5ms | 117.7ms | **33.8ms** |
| Reconciler bench: single-child update @32 | 6.7ms | 5.1ms | **3.7ms** |
| Integration: worker busy CPU, notes 2–4 | 1446ms | 1182ms | **742ms (−49%)** |
| Integration: `handleVDomMount` phase | 540ms (37%, #1) | 406ms | **110ms (15%, #5)** |

*measured at round-1 state; the profile target didn't exist at baseline.

Diagnosis method worth keeping: when profile attribution plateaued,
temporarily instrumenting `internSchemaReturningSchemaAndHash` misses with
sampled stacks (`__internMiss` counter) found the exact fresh-object seams in
minutes — profiles alone couldn't separate "hash of what, built where".

Remaining (smaller) hash consumers in the settle phase: `resolveLink` under
query-proxy reads and `resolveSchema` under `validateAndTransform` — both
downstream of read-path value materialization (the value-graph identity reuse
lever), plus the schema-interning-at-getCell seam for caller literals.

## Optimization round 3 (June 2026): schema interning at the cell seam

The "schema-interning-at-getCell" lever from round 2's remaining list.
Schemas attached to cell links via `runtime.getCell` / `getCellFromLink` /
`getImmutableCell` and `cell.asSchema` are now interned
(`internCellLinkSchema` in cell.ts): deep-frozen in place and collapsed to
one canonical instance per structure, so every identity-keyed schema cache
(`cfc.schemaAtPath`, schema-ref memos, selector standardization, value-hash)
keys off the canonical instance from cell creation onward — including
`key()` subschema derivation, which hits the `schemaAtPath` memo from the
first access. In-place freezing is the same contract `resolveSchema()`
already applies to cell schemas on every read/write-policy path; a caller
sweep found no code mutating a schema after passing it to these APIs.

Two carve-outs surfaced during implementation:

1. **Query-result-proxy schemas must not be frozen in place.** The wish
   builtin's `schema` argument is read through a query-result proxy;
   `Object.freeze` forwards through the proxy and freezes the underlying
   stored value, violating the proxy's object invariants (caught by the
   pattern-scope wish-scope test as an `ownKeys` invariant TypeError).
   `internCellLinkSchema` JSON-round-trips proxy-containing schemas first,
   per the existing convention for proxy-wrapped schemas.
2. **`TransformObjectCreator.mergeMatches` rebuilt its combined anyOf/allOf
   cell schema fresh per matched cell** (the round-2 instrumentation
   technique attributed 100% of intern misses during re-mount to this one
   site). Each fresh build paid a full content hash at the new `asSchema`
   seam: interning alone regressed re-mount @32 from ~34ms to ~46ms (+32%).
   Memoized per frozen compound-schema identity × the match's `asCell`
   values (module-level, mutable schemas never cached — same shape as the
   round-2 seam memos), which recovers it fully and makes the output
   identity-stable.

**Bench effect (interleaved A/B vs the round-2 base): neutral to slightly
positive.** Note-create @32 pull 24.4→22.7ms, selector subset path
623→598µs, reconciler mount/re-mount/update all within run noise. The
steady-state benches reuse stable module-level schema literals that
`resolveSchema()` already interned in place on first use, so their identity
caches were warm either way. The seam change is coverage and determinism
groundwork: canonical link schemas from creation (not from first
resolveSchema encounter), and structurally-equal-but-distinct schema objects
(fresh pattern-JSON parses, cross-module duplicate literals, derived
schemas) collapsing to one instance. The browser CDP pass confirms
neutrality: worker busy CPU for notes 2–4 measured 741ms on this branch vs
742ms on main (post-round-2), test green, per-note wall within noise.

## Optimization round 4 (June 2026): the remaining candidates, in order

Branch `perf/read-path-identity`. All four remaining candidates from the
round-2/3 lists, tackled in priority order:

**4a — read-path hashing eliminated.** Post-round-3 profiles showed ONE seam
holding ~97% of remaining hash time: `resolveCfcSchemaRefs` (the plural
follow-the-chain resolver; round 1 only memoized the singular) rebuilds a
fresh `{...resolved, ...rest, $defs}` spread whenever a `$ref` schema carries
extra keys — every `validateAndTransform` read of a vdom node. Memoized per
(frozen schemaObj, frozen fullSchema) pair with interned results. Also:
`selectorPathKey` ran on every `internPathSelector` call (>100ms self time)
— now cached per frozen path-array identity. Remount loop: 1888 → 1045ms;
hashing no longer appears in the remount profile's top frames at all.

**4b — trigger-index rebuild skipped on unchanged reads.**
`replaceActionTriggerPaths` cleared + re-added the per-entity trigger index
on every action re-run; now it remembers the last-registered (reads,
shallowReads) per action and returns the existing registration when equal.
Update loop A/B: ~4%; removes O(read-entities) churn from every steady-state
settle re-run.

**4c — verified-load-id seeding memoized.** `seedVerifiedLoadIds` re-walked
the full frozen pattern graph on every by-identity cache hit (every
map/filter op resolve; ~24% of the commit-callback phase in the original
profile). Seeded (root, loadId) pairs now skip.

**4d — shared-hashtag wish send halved.** The ~17ms per `#notebook` query
was `schemaAsCell` JSON-round-tripping the schema through its query-result
proxy — every property access pays full cell-read machinery — and being
called TWICE with identical input. Single materialization + content-keyed
parse/intern cache: **16.9 → 7.8ms avg** in the browser integration. The
remaining ~8ms is the one unavoidable stringify-through-proxy walk; reading
the schema slot without proxying (a concrete recursive JSON schema instead
of `schema: true` in TARGET_SCHEMA, or raw reads with link detection) is the
documented next step if it matters.

| Gauge | Round 3 / main | Round 4 |
|---|---|---|
| Remount loop, 100×(mount+unmount) @32 (in-process) | 1888ms | **1045ms** |
| Reconciler bench: mount+unmount @128 | 69.3ms | **47.6ms** |
| Integration: worker busy CPU, notes 2–4 | 742ms | **726ms** (pre-4d) |
| Integration: `#notebook` wish send | 16.9ms avg | **7.8ms avg** |
| Macro note-create @128 (pull) | 52.0ms | **49.8ms** |

Observed for a future round: `getCfcState` (~139ms self in the remount loop)
is now the largest single JS frame after GC.