# Pull-Based Scheduler > **Status**: Implemented and enabled by default > **Location**: `packages/runner/src/scheduler.ts` This document describes how the pull-based scheduler works in the CommonTools runtime. ## Background: Why Pull-Based? The original scheduler was **push-based**: when data changed, all dependent actions ran immediately. This was wasteful because: 1. **Greedy execution**: All actions that *could* be affected ran, regardless of whether their outputs were observed 2. **No lazy evaluation**: Intermediate computations ran even if final consumers didn't need them 3. **Wasted work**: In sparse graphs with many computations but few observers, most work was unnecessary The **pull-based** approach inverts this: - When data changes, computations are marked *dirty* but don't run - Only *effects* (side-effectful actions like `sink()`) are scheduled - Effects *pull* their dependencies on demand - Computations only run if an effect actually needs them ## Core Concepts ### Actions An **action** is a function `(tx: Transaction) => any` that the scheduler manages. Actions are created by the runtime when patterns use reactive primitives: | Pattern Primitive | Creates Action Type | Description | |-------------------|---------------------|-------------| | `sink(cell, callback)` | Effect | Runs callback when cell changes | | `lift(fn, ...inputs)` | Computation | Transforms inputs, writes to output cell | | `derive(input, fn)` | Computation | Derives new value from input | | Event handlers | Effect (one-time) | Responds to user events | The scheduler doesn't know about patterns - it only sees actions with read/write dependencies. ### Effects vs Computations | Type | Description | Behavior | |------|-------------|----------| | **Effect** | Side-effectful action | Always runs when scheduled. These are "roots of demand" | | **Computation** | Pure transformation | Only runs when an effect needs its output | When you call `sink()`, the runtime registers the action with `isEffect: true`. Everything else is a computation. ### Dirty vs Pending - **Pending**: Actions scheduled to run this cycle - **Dirty**: Computations whose inputs changed but haven't re-run yet When a cell changes: - Effects → added to `pending` - Computations → marked `dirty`, and downstream effects are scheduled A computation stays dirty until an effect pulls it. If no effect ever needs it, it never runs. ### The mightWrite Set Each action tracks what it has *ever* written (cumulative across all runs). This is necessary because write behavior can vary: ```typescript // Sometimes writes A, sometimes B if (condition) cellA.set(x); else cellB.set(x); ``` The dependency graph uses `mightWrite` (cumulative) rather than `dependencies.writes` (last run only) to conservatively track what might need to run. ## Execution Flow ### 1. Storage Change Notification ``` Cell X changes (value: before → after) ↓ Storage manager notifies scheduler ↓ Find actions that read X (via triggers map) ↓ For each triggered action: Effect → add to pending, queue execution Computation → mark dirty, propagate to dependents, schedule affected effects ``` The `triggers` map indexes actions by the cells they read, so finding affected actions is O(1) per cell. ### 2. Dependency Collection Before an action runs for the first time, the scheduler discovers its dependencies: ```typescript // Scheduler calls the action's populateDependencies callback const tx = runtime.edit(); populateDependencies(tx); // Action reads cells it will access const deps = txToReactivityLog(tx); // Extract reads from transaction tx.abort(); // Don't commit - we only wanted to capture reads ``` This happens in `execute()` before building the work set. ### 3. Building the Work Set In pull mode, the work set is built by starting from effects and pulling their dirty dependencies: ```typescript workSet = new Set(); // Start with pending effects for (const effect of pending) { if (isEffect(effect)) workSet.add(effect); } // Recursively collect dirty computations each effect depends on for (const effect of workSet) { collectDirtyDependencies(effect, workSet); } ``` The `collectDirtyDependencies` function finds dirty computations that write to cells the action reads. ### 4. Topological Sort Actions are sorted so dependencies run before dependents: ```typescript function topologicalSort(actions, dependencies, mightWrite, actionParent) { // Build graph: action A → action B if A writes something B reads // Add edges: parent → child (for nested patterns) // Kahn's algorithm with cycle handling } ``` When cycles exist, the sort prefers: 1. Nodes whose parents are already visited 2. Nodes with lowest in-degree This ensures parents run before children in nested patterns. ### 5. Run Actions ```typescript for (const action of sortedOrder) { // Skip if unsubscribed during this tick if (!isStillScheduled(action)) continue; // Skip if throttled if (isThrottled(action)) continue; // Clear from pending/dirty pending.delete(action); dirty.delete(action); // Run and record time for auto-debounce const start = performance.now(); await action(tx); recordActionTime(action, performance.now() - start); // Resubscribe with new dependencies resubscribe(action, txToReactivityLog(tx)); } ``` ### 6. Settle Loop Dependencies can change at runtime. The classic example is `ifElse`: ```typescript const result = ifElse(condition, branchA, branchB); ``` When `condition` changes from `true` to `false`: 1. Initial collect finds: `ifElse` depends on `branchA` 2. After `ifElse` runs, it now depends on `branchB` 3. If `branchB` is dirty, we need to run it and re-run `ifElse` The settle loop handles this: ```typescript for (let iter = 0; iter < 10; iter++) { // Re-collect dependencies from all effects const moreWork = new Set(); for (const effect of effects) { collectDirtyDependencies(effect, moreWork); } // Remove already-run actions for (const action of alreadyRan) moreWork.delete(action); if (moreWork.size === 0) break; // Settled // Run newly discovered work for (const action of topologicalSort(moreWork)) { if (dirty.has(action)) await run(action); } } ``` Max 10 iterations prevents infinite loops from non-converging cycles. ## Cycle Handling ### Why Not Explicit Cycle Detection? The original plan included Tarjan's algorithm to find strongly connected components (cycles) and handle them specially. This was abandoned because: 1. **Dynamic dependencies**: The dependency graph changes at runtime. Static cycle detection would miss cycles that appear only under certain conditions. 2. **Complexity**: Separate fast/slow cycle paths, convergence tracking, and cycle state management added ~200 lines of complex code. 3. **The real problem is simpler**: Most cycles are nested patterns (parent creates child, both read/write shared state). Parent-child ordering handles these naturally. ### What We Do Instead 1. **Parent-child ordering in topological sort**: When breaking ties in cycles, prefer nodes without unvisited parents. This ensures parents run before children. 2. **Settle loop**: Re-collect dependencies after running. Handles conditional patterns (ifElse) correctly. 3. **Iteration limits**: Max 10 settle iterations, max 100 runs per action. Prevents infinite loops. 4. **Cycle-aware debounce**: Actions running 3+ times in cycles taking >100ms get adaptive debounce (2× cycle time). ### Example: Nested Lift Pattern ``` multiplyGenerator (parent) ↓ creates multiply (child) ↓ writes result ↓ parent reads result ←←←←←←←←←←←←←←← ``` Both are triggered when input changes. Topological sort sees a cycle. By preferring the parent: 1. `multiplyGenerator` runs first 2. It may unsubscribe old child and create new one 3. Only the appropriate child runs ## Event Handlers Event handlers (button clicks, form submissions) are synchronous but may depend on computed values. The challenge: handlers can't `await` their dependencies. ### The traverseCells Flag When registering a handler, the runtime provides a `populateDependencies` callback: ```typescript handler.populateDependencies = (tx, event) => { // Read with traverseCells to capture nested Cell dependencies inputsCell.asSchema(schema).get({ traverseCells: true }); }; ``` The `traverseCells: true` flag tells `validateAndTransform` to recursively read into nested `Cell` objects (from `asCell: true` in schemas), capturing all dependencies. ### Handler Execution Flow ``` Event arrives ↓ Scheduler calls populateDependencies(tx, event) ↓ Extract reads from transaction ↓ Check if any dependencies are dirty ↓ If dirty: Schedule dirty computations Re-queue event (will run after deps compute) Else: Run handler synchronously ``` ### Global FIFO Ordering Events run in global arrival order. If event A arrives before event B, A runs first regardless of which component they target. This preserves causality from the user's perspective. Events are serialized globally, but their *dependencies* can compute in parallel. ## Debounce and Throttle ### Debounce: "Wait then run" Delays execution until triggers stop arriving: ```typescript scheduler.setDebounce(action, 100); // Wait 100ms after last trigger ``` Each trigger resets the timer. Good for search-as-you-type. ### Auto-Debounce Actions averaging >50ms (after 3 runs) automatically get 100ms debounce. Opt out with `{ noDebounce: true }` in subscription options. ### Throttle: "Stale by T ms" Limits execution frequency: ```typescript scheduler.setThrottle(action, 1000); // Max once per second ``` Unlike debounce, throttled actions stay dirty and will run when: 1. The throttle period expires, AND 2. An effect pulls them Good for rate-limiting expensive operations. ### Cycle-Aware Debounce Actions running 3+ times in execute cycles taking >100ms get adaptive debounce: ``` debounce = 2 × cycle_time ``` This naturally slows down problematic cycles without manual intervention. ## Key Data Structures ```typescript class Scheduler { // Action classification private effects = new Set(); private computations = new Set(); // Scheduling state private pending = new Set(); private dirty = new Set(); // Dependency tracking private dependencies = new WeakMap(); private dependents = new WeakMap>(); // Reverse graph private mightWrite = new WeakMap(); // Cumulative writes // Triggers: cell → actions that read it private triggers = new Map>(); // Parent-child relationships (for nested patterns) private actionParent = new WeakMap(); private actionChildren = new WeakMap>(); // Performance tracking private actionStats = new WeakMap(); // Debounce/throttle private actionDebounce = new WeakMap(); private actionThrottle = new WeakMap(); } interface ReactivityLog { reads: Address[]; writes: Address[]; potentialWrites?: Address[]; // For diffAndUpdate pattern } interface ActionStats { runCount: number; totalTime: number; averageTime: number; lastRunTime: number; lastRunTimestamp: number; } ``` ## Constants ```typescript MAX_ITERATIONS_PER_RUN = 100 // Max runs per action per execute cycle AUTO_DEBOUNCE_THRESHOLD_MS = 50 // Avg time to trigger auto-debounce AUTO_DEBOUNCE_MIN_RUNS = 3 // Runs before auto-debounce kicks in AUTO_DEBOUNCE_DELAY_MS = 100 // Debounce delay for slow actions CYCLE_DEBOUNCE_THRESHOLD_MS = 100 // Cycle time to trigger adaptive debounce CYCLE_DEBOUNCE_MIN_RUNS = 3 // Runs in cycle to be considered cycling CYCLE_DEBOUNCE_MULTIPLIER = 2 // Debounce = multiplier × cycle time DEFAULT_RETRIES_FOR_EVENTS = 5 // Retry count for commit conflicts MAX_RETRIES_FOR_REACTIVE = 10 // Retry count for reactive actions ``` ## Debugging ### Enable Logging ```typescript // scheduler.ts line 38 const logger = getLogger("scheduler", { enabled: true, level: "debug", }); ``` Logs show: - Storage notifications and triggered actions - Work set construction and topological sort order - Settle loop iterations - Cycle detection and debounce decisions ### Diagnostic API ```typescript // Overall state scheduler.getStats() // { effects, computations, pending } scheduler.isPullModeEnabled() // Action queries scheduler.isEffect(action) scheduler.isComputation(action) scheduler.isDirty(action) scheduler.getDependents(action) scheduler.getActionStats(action) // { runCount, avgTime, lastRunTime, ... } // Debounce/throttle scheduler.getDebounce(action) scheduler.getThrottle(action) scheduler.setDebounce(action, ms) scheduler.setThrottle(action, ms) // Filter stats (pull vs push efficiency) scheduler.getFilterStats() // { filtered, executed } ``` ### Common Issues **Computation not running:** - Verify an effect depends on it (check `getDependents`) - Check if it's dirty: `isDirty(action)` - If throttled, wait for period to expire **Action running too many times:** - Check for commit conflicts causing retries - Look for rapidly-changing dependencies creating cycles - Add debounce: `setDebounce(action, 100)` **Seeing stale values:** - Check throttle settings - Ensure the reading action is marked as an effect - Verify dependencies are correctly declared **Max iterations hit:** - Action has self-referential dependency (writes to what it reads) - Add debounce to slow down the cycle - Consider restructuring the pattern to break the cycle ## API Reference ### Subscribe ```typescript scheduler.subscribe( action: Action, populateDependencies: (tx: Transaction) => void, options?: { isEffect?: boolean, // Mark as effect (always runs) debounce?: number, // Delay in ms before running noDebounce?: boolean, // Opt out of auto-debounce throttle?: number, // Min ms between runs } ): Cancel ``` ### Mode Control ```typescript scheduler.enablePullMode() scheduler.disablePullMode() scheduler.isPullModeEnabled() ``` ### Waiting ```typescript scheduler.idle(): Promise // Resolves when no pending work ``` ## Tests The scheduler has comprehensive test coverage in `packages/runner/test/scheduler.test.ts` covering: - Effect vs computation classification - Dirty propagation and pull mechanics - Topological ordering - Debounce and throttle behavior - Cycle handling and iteration limits - Event handler dependencies