# Committed-write backpressure

How the scheduler keeps a committed write from being silently dropped when the
server rejects it under contention.

## The problem

The runtime is local-first and optimistic. When an event handler writes, the
write is applied to local replicated memory immediately and the transaction is
committed to the server in the background. The handler does not wait for the
server. The server can still reject the commit, most often with a per-entity
basis-sequence conflict: another writer advanced the entity's sequence between
the time this commit read its basis and the time it reached the server. On
rejection the optimistic write is rolled back (a "revert") and the originating
event is re-run.

The re-run used to have a fixed budget. An event handler retried five times
(`DEFAULT_RETRIES_FOR_EVENTS`) and then gave up: it logged "Event handler
transaction failed after exhausting all retries" and dropped the write. Nothing
surfaced to the user.

Under sustained contention that budget is exhausted before the contention
clears. The concrete case that motivated this work: profile appends issued while
the home space rehydrates. Loading a home that already has profiles produces a
burst of basis-sequence conflicts as reactive rehydration commits churn the home
entity. A profile append targeting the same entity loses the race over and over,
runs out of its five retries in a few milliseconds, and is dropped. Creating
three profiles left a durable count of one. "Drops data under load" is a
correctness cliff, not graceful degradation.

## The principle

A committed write that represents real user intent must converge or fail loudly.
It must never vanish. Under contention the system should get slower, not lossy.

That requires separating two kinds of rejection that the old fixed budget
treated the same:

- A **transient stale-basis conflict** (`ConflictError`). Re-running the handler
  against fresh confirmed state and committing again can succeed. These are
  exactly the rejections that a contention burst produces, and the ones that
  must keep being retried until they land.
- A **permanent precondition failure** (`PreconditionFailedError` —
  `receipt-exists`, `origin-committed`). Re-running can never succeed and must
  not happen. `receipt-exists` means the event was already durably handled by a
  prior delivery (idempotent dedup); `origin-committed` means the event's origin
  lineage did not commit, so the descendant must not apply.

A third group — handler-initiated aborts and system errors — are transient in
the sense that they are not permanent precondition failures, but they are not
contention either, so retrying harder does not help them.

## The model

The event-handler commit path classifies each commit result and acts on it
(`packages/runner/src/scheduler/events.ts`, `classifyCommitDisposition`):

- **Success** — done.
- **Stale-basis `ConflictError`** — the backpressure path. The event is re-queued
  parked via the existing `notBefore` mechanism with a single capped exponential
  backoff plus jitter (`scheduler/backpressure.ts`, `computeBackoffDelayMs`). The
  curve is deliberately near-immediate at the start: the default `baseDelayMs` is
  25/32 ms, so the first few delays are 0.78, 1.56, 3.125 ms — effectively
  immediate. A stale-basis conflict usually clears the instant the fresh confirmed
  state arrives, and these sub-5ms delays let it converge within a settle (the
  harness and the UI settle by waiting for the event queue to drain, so a retry
  that would have cleared immediately must not be spaced out). The delay only
  grows into real spacing once a conflict persists: it reaches 25ms before the
  seventh attempt and doubles to a 1-second cap. Backoff makes the scheduler slow
  down under sustained contention instead of busy-looping; jitter keeps concurrent
  writers contending for the same entity from retrying in lockstep. The event
  keeps retrying for a bounded window (default 30 seconds), measured from the
  first conflict, which is long enough to outlast a rehydration burst. `idle()`
  and `settled()` already wait for a parked head event, so a write that converges
  still completes within a settle.
- **Window elapsed without converging** — a terminal `CommitConvergenceError` is
  surfaced through the scheduler error channel (`scheduler.onError`). The write
  fails loudly rather than disappearing. This is the bounded-resource backstop:
  if a conflict genuinely never clears, the system does not retry forever, it
  reports.
- **Permanent rejection** — never retried, exactly as before, and still
  observable through the `scheduler.event.commit` telemetry marker
  (`permanentRejection`).
- **Any other transient error** (abort, system error) — keeps the fixed
  `retriesLeft` budget and the previous retry-then-stop behavior. Backpressure
  would not help a handler that aborts itself, and retrying it within the window
  would loop pointlessly.

### Resource bounds

Backoff caps the retry *rate* (one parked timer per intent, capped delay), and
the window caps the total *duration*. Together they bound the work a single
contended write can generate to roughly tens of attempts over the window, not a
busy loop. The event queue still holds one entry per intent; a backoff retry
re-queues that same entry in place rather than fanning out.

### Event-queue ordering

The scheduler processes the event queue strictly head-first, and a parked event
(`notBefore` in the future) holds the head until its timer fires — this is the
same behavior the existing dirty-dependency throttle already relies on. A backoff
retry is parked at the head, so while a contended write is backing off, events
queued behind it wait. Under a transient storm (which clears in well under a
second) this is imperceptible. The visible effect is only on a write that keeps
conflicting for seconds: event processing behind it slows until the write either
lands or the retry window elapses and it fails terminally. That is the intended
backpressure — the system gets slower under sustained contention rather than
losing the write. `maxDelayMs` bounds how long any single backoff step holds the
head.

### Idempotency

Retrying re-runs the same event with the same durable event id. The memory
engine's receipt machinery makes a re-delivery of an already-applied event a
permanent `receipt-exists` rejection, so a retry cannot double-apply. A retry
only happens after a rejection, where the optimistic write was reverted, so the
re-run reads fresh confirmed state (including profiles that landed in the
meantime) and reconciles — which is why three concurrent appends converge to a
list of three rather than clobbering each other.

### The `retries: 0` opt-out

`queueEvent`'s `retries` argument now gates whether conflicts are retried at all,
rather than bounding how many times. The default (`DEFAULT_RETRIES_FOR_EVENTS`,
used by every real user event through `cell.send`) is positive, so user events
get backpressure. A caller that sends with `retries: 0` — a speculative lineage
origin, an internal one-shot — opts out: a conflict gives up immediately, so a
descendant of a failed origin still drops deterministically. The exact positive
count no longer bounds conflict retries; the window does.

## Configuration

`RuntimeOptions.commitBackpressure` tunes the policy
(`scheduler/backpressure.ts`, `CommitBackpressurePolicy`): `baseDelayMs`,
`maxDelayMs`, `jitter`, `retryWindowMs`. Unset fields fall back to
`DEFAULT_COMMIT_BACKPRESSURE` and every field is clamped to a well-defined range
(non-negative delays, a cap no lower than the base delay, jitter within [0, 1], a
non-negative window). The clamps only keep the arithmetic sane; the
never-silently-dropped guarantee does not depend on them. A zero window is
allowed and does not reintroduce silent drops — it makes the first conflict fail
terminally instead of being retried. Tests use this to shrink the window and
backoff.

## Observability

The `scheduler.event.commit` telemetry marker carries the new state:
`retryAttempt` and `backoffMs` on a backoff retry, and `terminal`
(`"permanent"` | `"convergence"`) when a commit reaches a terminal outcome.
A non-converging write also logs `commit-convergence-failed` and is delivered to
registered `scheduler.onError` handlers as a `CommitConvergenceError`.

## The reactive-action path

The reactive path (`scheduler/action-run.ts`) does not need this backpressure
and shares only the `isConflictRejection` classifier. A reactive action is a
re-derivation: its output is a function of its inputs. On a conflict it does not
enter the bounded retry budget — instead it re-arms its subscription, waits for
the conflict's `readyToRetry` catch-up, and re-queues itself to re-run against
the caught-up state. (Reader-dirty propagation re-runs it too when the catch-up
write lands as a fresh notification, a redundant fast path that does not cover a
conflict whose triggering write was already delivered.) A conflict there is a
wait for catch-up, not a failure, and consumes no budget. Only non-conflict
transient errors fall back to the bounded `MAX_RETRIES_FOR_REACTIVE` retry, and
every attempt re-subscribes so the action recovers when its inputs next change.
The backpressure rework targets the event-handler path instead, where a one-shot
write *is* the user's intent and cannot be re-derived from inputs, so a conflict
must be actively retried rather than recovered by re-derivation.

## Tests

- `packages/runner/test/scheduler-commit-backpressure.test.ts` — the validation
  of record. It drives the event-handler commit path against an emulated server
  that rejects commits on demand: a burst of transient conflicts longer than the
  old budget still lands; a permanent rejection is not retried and stays
  observable; a never-converging conflict surfaces a terminal error within the
  window with bounded attempts; and three whole-array appends
  (`list = [...list, value]`, the profile-append shape) survive a conflict storm
  so the durable count reaches three. This deterministically reproduces the
  silent-loss bug and proves the fix.
- `packages/runner/test/scheduler-event-lineage.test.ts` — adapted so a
  permanently failed origin is modeled without relying on budget exhaustion; it
  exercises the give-up (`retries: 0`) and terminal-convergence paths.
- `packages/patterns/integration/home-profile.test.ts` — unchanged browser-level
  profile-creation regression coverage; still passes (the fix does not regress
  the normal cross-space append).

## Reproduction status in this codebase

The motivating instance — profile appends swallowed by a rehydration conflict
storm, leaving a durable count of one — was confirmed in an earlier copy where
loading a home with existing profiles produced about nineteen basis-sequence
conflicts. In the current copy that storm is much milder: rehydrating a home
with a profile produces only a few reactive-commit conflicts, and they clear
before a profile append is issued, so the append's event commit does not hit a
conflict at all. A browser count-probe driven against this copy therefore does
not exercise the backpressure path (no event-handler conflict, no backoff), so
it cannot validate the fix end-to-end here, and any residual count discrepancy
under idle-only waits comes from cross-space rehydration timing rather than the
conflict-exhaustion this change addresses. The fix is validated instead at the
runner level, where the conflict storm is injected deterministically and the
committed write is shown to converge rather than drop.