# Cells

This document specifies cells — the fundamental unit of reactive state in the
system.

## Status

Draft — based on extensive codebase investigation and design discussion.

## Overview

A **cell** is a named location that holds a typed value. Cells are identified
by a link (entity ID + path) and participate in the reactive dataflow graph.

Cells can be categorized along two dimensions:

1. **Implementation mechanism**: How change detection works
   - Value cells (content-compared)
   - Stream cells (occurrence-based)

2. **Semantic role**: What purpose the cell serves
   - Process cells (execution metadata)
   - Precious data cells (irreplaceable):
     - User input (created/edited by user)
     - External input (fetched from outside, world has moved on)
   - Computed result cells (derived, reconstructible)
   - Stream cells (event endpoints)

---

## Current State

### Cell API

Cell methods are stratified by layer:

#### Transaction Layer

The core methods for data access:
- `get()`, `getRaw()` — read value (reactive)
- `sample()` — read value (non-reactive, no dependency tracking)
- `set()`, `setRaw()` — write value
- `update()`, `push()`, `remove()` — mutate
- `key()` — navigate to nested property
- `withTx()` — bind to transaction
- `asSchema()` — type cast

#### Reactivity Layer

Subscription and synchronization:
- `sink()` — subscribe to changes
- `pull()` — read with dependency tracking
- `sync()` — ensure synchronized with storage

#### Stream-Specific

- `send()` — dispatch event (only on streams)

#### Rarely Used Outside Foundation

- `freeze()`, `isFrozen()`
- `setInitialValue()`, `setSelfRef()`
- `connect()`, `export()`
- `setSchema()` (deprecated)

### Implementation Mechanism: Value vs Stream

The implementation distinguishes cells by their change detection behavior:

| Aspect | Value Cell | Stream Cell |
|--------|------------|-------------|
| Stored value | Actual data | `{ $stream: true }` marker |
| Read | `get()` returns value | `get()` returns marker (not useful) |
| Write | `set()` stores value | `send()` dispatches event |
| Change detection | Content comparison | Every send is distinct |
| Persistence | Value persisted | Only marker persisted; events ephemeral |

The essential difference is **duplicate handling**:

- `cell.set(5); cell.set(5);` → one state, no second reaction (idempotent)
- `stream.send(5); stream.send(5);` → two events, two handler invocations

This is **state vs occurrence**:
- Value cells answer: "What is the current state?"
- Streams answer: "What just happened?"

### Semantic Categories

While the implementation sees only "value" and "stream," there are richer
semantic distinctions that matter for garbage collection, recovery, and UI:

#### Process Cells

Control plane metadata for piece execution:

```
{
  $TYPE: string,        // pattern ID
  resultRef: SigilLink, // link to result cell
  argument?: any,       // input data
  spell?: SigilLink,    // link to spell
  internal?: any        // working state
}
```

Process cells are implemented as value cells but serve a distinct purpose:
tracking which pattern governs a piece and linking to its result. See
[Storage Format](./2-storage-format.md) for details.

#### Precious Data Cells

Data that **cannot be reconstructed** from inputs. Two subtypes:

**User input**: Data created or edited directly by the user:
- Text they typed
- Selections they made
- Files they uploaded
- Manually curated content

Loss means losing the user's creative act, which cannot be replayed.

**External input**: Data fetched from outside the system:
- API responses (stock prices, weather, search results)
- Scraped web content
- Sensor readings at a point in time

Loss means the data is gone — the external world has moved on and won't
produce the same result again. Even "re-fetching" yields different data.

Both subtypes must be preserved. The system should never garbage-collect
precious data.

#### Computed Result Cells

Derived data produced by patterns from inputs:
- Pattern outputs
- Aggregations and transformations
- Cached computations

These **can be reconstructed** by re-running the pattern with the same inputs.
Persisting them is an optimization (avoid recomputation), not a requirement.
The system could potentially discard and recompute these during compaction.

#### Stream Cells

Event endpoints for occurrences:
- User interactions (clicks, input)
- External events
- Signals between pieces

Events are ephemeral — only the most recent matters for triggering handlers.
The `{ $stream: true }` marker persists to preserve the stream's identity,
but event payloads do not persist.

### Cross-Cutting Observations

- **Implementation vs semantics**: "Value cell" spans three semantic roles
  (process, precious, computed). The implementation doesn't distinguish them.
- **Reconstructibility**: The key semantic question is "can this be rebuilt?"
  Process and computed cells: yes. Precious and stream identity: no.
- **Identity mechanism**: All categories use the same `NormalizedFullLink`
  infrastructure for addressing and reactivity.
- **Declared vs effective API**: The type system declares six cell variants
  (`Cell`, `ReadonlyCell`, `WriteonlyCell`, `ComparableCell`, `OpaqueCell`,
  `Stream`), each with a different subset of methods. In practice, pattern code
  uses only two: `Cell` and `Stream`. `Writable<T>` is a type alias for
  `Cell<T>` with no distinct semantics — the codebase uses them
  interchangeably. The restricted variants exist in the API declarations but
  have no pattern-level consumers.

### Shared Identity Base

Regardless of cell type, all cells share:

- `entityId` — stable identifier
- `schema` — optional type information
- `getAsLink()` — serialization to `SigilLink`

The `toJSON()` method exists but is only called via generic duck-typed
serialization patterns, not cell-specific code.

---

## Proposed Directions

### API Surface Reduction

The "Rarely Used Outside Foundation" methods could be made internal-only,
reducing the public API surface and clarifying which methods are intended for
general use.

Beyond method visibility, the type hierarchy itself is wider than what consumers
use. The API declares six cell variants with different capability subsets:

| Variant | get | set | push | send | key | equals |
|---------|-----|-----|------|------|-----|--------|
| `Cell` / `Writable` | yes | yes | yes | yes | yes | yes |
| `ReadonlyCell` | yes | — | — | — | yes | yes |
| `WriteonlyCell` | — | yes | yes | — | yes | — |
| `ComparableCell` | — | — | — | — | yes | yes |
| `OpaqueCell` | — | — | — | — | yes | — |
| `Stream` | — | — | — | yes | — | — |

Pattern code uses only `Cell`/`Writable` and `Stream`. The effective
pattern-level API is:

- **Cell** (= `Writable`): `.get()`, `.set()`, `.push()` (dominant);
  `.key()`, `.update()`, `.remove()`, `.equals()` (occasional)
- **Stream**: `.send()`

`Writable<T>` is a pure type alias for `Cell<T>` — same interface, same runtime
object. It carries no enforced semantics: the runtime does not distinguish the
two. However, the naming serves a practical purpose: LLM-based code generators
were observed to confuse read-only pattern parameters (which are also reactive)
with writable state when both used the `Cell<T>` name. Renaming the writable
variant improved LLM comprehension of intent. The codebase is currently
inconsistent about which name is used — some patterns write `Writable<T>`,
others write `Cell<T>` in identical roles — but the intent is for `Writable<T>`
to signal write access.

The restricted variants (`ReadonlyCell`, `WriteonlyCell`, `ComparableCell`,
`OpaqueCell`) are not yet used by pattern code. The plan is to introduce a
transformer step that automatically narrows declared types to match actual usage
— e.g. `OpaqueCell` for reference-only passing, `WriteonlyCell` for write-only
cells. This would enforce least-privilege without requiring pattern authors to
manually select the right variant.

### Unification via Timestamps

If timestamps are an essential component of event data, the distinction
collapses:

```
stream.send(5) at t=1  →  {value: 5, timestamp: 1}
stream.send(5) at t=2  →  {value: 5, timestamp: 2}
```

These are content-distinct. Standard change detection handles it correctly.

#### The Unified Model

Instead of two cell types with different methods:
- One cell type
- Schema describes the data shape, including timestamps if event-like
- Change detection compares the whole value
- No special `asStream` flag needed

Example schema for an event location:
```json
{
  "type": "object",
  "properties": {
    "x": { "type": "number" },
    "y": { "type": "number" },
    "timestamp": { "type": "number" }
  }
}
```

The "event-ness" emerges from the data shape. Event producers include timestamps
(they know when things happened). No magic flags or bifurcated types.

#### What Unification Eliminates

- `asStream: true` flag
- Separate Stream type with duplicated methods
- `isStream()` / `isCell()` brand checking
- Special change-detection logic

#### What Unification Requires

- Clear convention for timestamp fields
- Possibly: schema-level indication of "where does the timestamp come from"
- Migration path for existing stream usages

#### Concerns and Constraints

The original stream/cell distinction was intentional and reflects real semantic
differences beyond change detection:

- **Scheduler context**: Computed values (lifts) are idempotent — they receive
  the same context across re-invocations, and re-running them with the same
  inputs should produce the same result. Event handlers are not idempotent —
  each invocation receives a fresh context, and side effects should not be
  replayed. A unified cell type would still need to distinguish these execution
  modes.
- **Reactive trigger semantics**: A lift re-executes when *any* input changes.
  An event handler should execute only when the event fires, not when other
  inputs change. This "react to event only" behavior is distinct from "react to
  any dependency" and would need to be preserved or re-expressed.
- **Stored last-event utility**: Unified cells could store the last event value,
  but the use cases for reading "the last click event" are unclear compared to
  reading current state.
- **DX clarity**: "Stream" is an established concept with well-understood
  semantics. Replacing it with timestamp conventions may reduce clarity.

Any implementation of unification should adopt the existing behavioral
distinctions (scheduler context, trigger semantics) rather than trying to
eliminate them. The goal is to unify the *storage and type representation*, not
to pretend that state and events have identical execution semantics.

#### Current State: No Timestamps on Payloads (but Invocations Are Unique)

The current system does **not** add timestamps or unique IDs to event payloads:
- `stream.send(event)` passes the payload directly to the scheduler
- DOM events have a `timeStamp` property, but `sanitizeEvent()` does not
  include it in the allowlist of properties passed through

However, each handler **invocation** does receive a unique identity. When an
event handler is invoked, the runner generates a fresh UUID
(`crypto.randomUUID()` in `runner.ts:1219`) and includes it in the `cause`
object used to derive cell identities for handler results. This ensures that
each invocation produces distinct result cells, even for identical event
payloads. The uniqueness is in the invocation context, not in the event data
itself.

For unification, timestamps or IDs would need to be part of the **event payload
or cell value** (not just the invocation context) — either injected at the
`send()` layer, in `sanitizeEvent()`, or by event producers explicitly.

#### Event Replay Consideration

Event replay (for debugging, testing, or recovery) is a related concern. For
replay to be possible, events would likely need unique markings anyway —
timestamps, sequence numbers, or IDs. This aligns with the unification
proposal: if events must be uniquely identifiable for replay, that same
identity makes them content-distinct for change detection.

---

## Open Questions

### Implementation Unification
- What is the migration path from current stream/cell split to unified model?
- How do existing `asStream: true` schemas translate?
- Should timestamps be required for events, or can the system add them?
- What are the exact semantics of "last event" queries on unified cells?
- How does the unified model affect the handler registration mechanism?

### Semantic Categories
- How should precious vs computed be distinguished in the data model?
- Should there be explicit markers, or is it inferred from the dataflow graph?
- What are the garbage collection / compaction rules for each category?
- How do semantic categories affect sync and backup strategies?

---

**Previous:** [Identity and References](./3-identity-and-references.md) | **Next:** [Transactions](./5-transactions.md)