# Chapter 9 — Storage and Sync

Chapter 8 ended with an optimistic commit leaving the local replica. This
chapter follows it: the wire protocol, how the server detects conflicts
without locks, how other clients hear about it, and what's actually on disk.
The code is `packages/memory` (the store and protocol) plus
`packages/runner/src/storage/` (the client side).

> **Historical note, because you'll see it in the code.** The package
> contains two generations. The original ("v1") modeled storage as a chain
> of immutable *facts* `{the, of, is, cause}` with per-fact compare-and-swap
> — its types still shape the public vocabulary (`URI`, `ConflictError`,
> entity ids like `of:<hash>`). The current engine ("v2",
> `packages/memory/v2/`) keeps the spirit — optimistic concurrency over
> append-only history — but moves the unit of conflict from single facts to
> **commits with read-set validation**, which is what's described below.
> Toolshed and the runtime speak only v2.

## The data model: documents in spaces

The unit of storage is an **entity document**: a JSON document named by a
hash id (`of:<hash>`, derived from the cell's creation context — the
document itself mutates while its id stays stable), living in a space, with the cell's payload
under its `value` field plus metadata (e.g. a `source` link to the owning
piece). Cells (Chapter 8) address `(space, id, path)`; the path is resolved
inside the document.

Documents also carry a **scope key**. `PerUser`/`PerSession` cells
(Chapter 2) aren't filtered by queries — the engine physically partitions
revisions by `scope_key` derived from the session's principal
(`v2/engine.ts`). Isolation by partition, not by discipline.

## The commit protocol: optimistic concurrency via read sets

A client ships each transaction as a `ClientCommit`
(`packages/memory/v2.ts`):

```ts
// Shown for illustration only.
ClientCommit {
  localSeq,                       // client-session-local sequence number
  reads: {
    confirmed: [{ id, path, seq }],      // "I read X at server commit seq N"
    pending:   [{ id, path, localSeq }], // "I read my own unconfirmed commit"
  },
  operations: [ set | patch | delete | sqlite ],
}
```

The `reads` field is the concurrency-control token, built automatically from
the transaction journal (Chapter 8). The protocol is *optimistic*: no locks,
no leases — the client asserts what it saw, and the server validates that
assertion at commit time (`v2/engine.ts`, inside a single SQLite
transaction):

1. **Idempotence.** A commit is keyed by `(sessionId, localSeq)`. A replay
   returns the recorded result; a *different* payload under the same
   `localSeq` is a protocol error. This makes retry-after-disconnect safe.
2. **Read validation.** For each confirmed read, the engine checks whether
   any later revision touched it. Full-document `set`/`delete` conflicts
   with any read of that document; a `patch` conflicts **only if its patched
   paths overlap the read path**. Two users patching disjoint fields of the
   same document both succeed — this path-granular rule is what makes
   field-level two-way bindings (Chapter 4) practical under concurrency.
3. **Pending resolution.** Pipelined commits (a client needn't wait for ack
   to keep committing) have their pending reads mapped to the server
   sequence numbers their dependencies actually landed at, then validated
   the same way.
4. **Append.** The commit gets the next global `seq`; each operation becomes
   a revision row; per-document heads advance.

On conflict, the loser gets a `ConflictError` *and* the server immediately
pushes fresh state for the contested documents to that session — so the
client-side story from Chapter 8 (revert, re-run, retry) starts from
current data, not after another round trip.

Client-side resilience details worth knowing
(`packages/memory/v2/client.ts`): unacknowledged commits are kept and
**replayed on reconnect** (safe by idempotence), and a connection drop
leaves commits queued rather than failed. Offline-tolerant by construction,
within a session's lifetime.

## Subscriptions: watches over schema-shaped graphs

The read side mirrors the reactive model. Over one WebSocket, a client opens
a session per space (`hello` → `session.open`, then `transact` /
`graph.query` / `session.watch.set` / `session.watch.add` / `session.ack`).

A **watch** is a set of graph queries: roots (`{id, selector}`) where the
selector is a *schema path selector* — the same schemas from Chapter 7,
acting as a query language. The server traverses each root document,
**following links reachable under the schema**, and records which documents
the traversal touched. That set is the subscription. So "subscribe to this
piece" means: walk its argument/result documents as shaped by its schemas,
across link boundaries, and watch everything you saw. (This is the precise
sense of the tagline from Chapter 1 — *reactivity is subscription to the
result of a query defined by schemas*.)

After every commit the server marks the written ids dirty and, on a
debounced tick, re-walks only the affected graphs per session, diffs against
what that session already has, and pushes a `session/effect` carrying a
`SessionSync` — upserts (id, seq, document) and removals. The originating
session is skipped (it already applied the change optimistically). The
client integrates the delta into its replica and notifies cell sinks —
re-entering Chapter 8's trigger index, completing the loop from the
Chapter 1 trace.

## On disk: one SQLite database per space

Each space is one SQLite file (`<store>/engine-v3/<space-did>.sqlite`),
WAL-mode. The core tables (`v2/engine.ts` INIT script, abridged):

```sql
"commit"  (seq PK, session_id, local_seq, original JSON, resolution JSON, ...)
           -- the full history of commits; UNIQUE (session_id, local_seq)
revision  (id, scope_key, seq, op, data JSON, ...)   -- one row per operation
head      (id, scope_key → seq)                      -- current tip per document
snapshot  (id, scope_key, seq, value JSON)           -- periodic materializations
branch    (name, parent_branch, fork_seq, head_seq, ...)
blob_store(hash PK, data BLOB, content_type, size)
```

Reading a document means: find its head revision; if it's a `set`, decode
it; if a `patch`, replay patches forward from the nearest snapshot
(a snapshot is written once enough patch revisions — 10 by default —
accumulate since the last full value, bounding replay cost). So the store
is an **event log with materialized checkpoints** — history is retained,
heads are fast, and the conflict checks in the commit protocol are simple
indexed queries over `revision`.

The schema also reveals features at the edge of this tutorial's scope:
`branch` (commits can target named branches forked from main, with merge
records), and persisted scheduler state (tables that let server-side
executors rehydrate action read/write indexes across processes).

## The SQLite capability (cells as databases)

Patterns can also declare cells that *are* SQLite databases (the
`sqlite-builtin`, `docs/specs/sqlite-builtin`): the cell's entity id names a
per-`(space, id)` database file. The design is asymmetric, deliberately:

- **Writes** ride inside ordinary commits as `{op: "sqlite", db, sql, params}`
  operations — atomic with cell writes in the same transaction, conflict-
  checked like everything else. There is no standalone SQL-write RPC.
- **Reads** (`sqlite.query`) bypass the engine connection entirely and go
  through a pooled set of *read-only* connections opened directly on the
  database file (`v2/sqlite/read-pool.ts`), with a statement guard
  (SELECT-only, no ATTACH/PRAGMA). Reads scale without touching the writer;
  read-only is enforced by the OS-level open flag, not by parsing alone.

A server-side registry can also map a handle to an external on-disk
database (`cf piece link sqlite:/abs/path <piece>/<field>`) — read-only —
which is how local datasets get exposed to patterns.

## The local/remote symmetry

One detail with outsized architectural value: the *same* client and server
classes run in-process. `EmulatedStorageManager`
(`packages/runner/src/storage/v2-emulate.ts`) constructs a real
`MemoryServer` and connects the real client over a loopback transport — no
sockets, identical protocol. Tests and CLI runs exercise the genuine commit
and watch machinery, and "local vs deployed" differs only in transport and
authentication. When debugging sync behavior, you can usually reproduce it
entirely in-process.

---

**Next:** [Chapter 10 — Identity, authorization, and
isolation](10-identity-and-security.md): who is allowed to do all of the
above, and how untrusted code is contained.