# @commonfabric/cf-harness

`cf-harness` is an in-house agent harness package for Common Fabric. It is being
built as a general Common Fabric agent runtime, with Loom as the first target
use case.

The package is intentionally early and experimental. It already has a real
execution core, a bounded prompt/tool loop, persistence, resumability, a thin
operator CLI, explicit Agent Skills preload, and the first pass of CFC-aware
deny/recovery shaping.

## Why This Exists

Common Fabric needs an agent harness that can become CFC-aware without
retrofitting CFC semantics awkwardly onto a third-party runtime.

The current design direction is:

- `runner` owns authoritative CFC meaning
- `cf-harness` transports and respects those semantics
- lower layers such as the gVisor-backed sandbox enforce conservative mediation
- the harness itself stays mechanistic rather than asking models to make policy
  decisions

## Current Scope

What works today:

- shell-centric execution against the local `runsc-cfc` sandbox path
- sandbox containers default to Docker `--network bridge` so local Loom/Fabric
  helper services can be reached through Docker Desktop's `host.docker.internal`
  host alias during early integration work; set
  `CF_HARNESS_DOCKER_NETWORK_MODE=host` when a runtime should explicitly use
  host networking
- default sandbox image aligned with the public CFC kitchen-sink image published
  from the sibling `gvisor` repo:
  - `us-docker.pkg.dev/commontools-core/common-fabric/sandbox-kitchensink:latest`
  - override per run with `--sandbox-image` or `CF_HARNESS_SANDBOX_IMAGE`
- built-in tools:
  - `bash`
  - `bash-no-sandbox` (provisional host shell for named subagent profiles only)
  - `read_file`
  - `view_image`
  - `web_fetch` (explicit parent allowlist or `web_fetch` subagent profile only)
  - `read_skill_resource`
  - `run_skill_script`
  - `edit_file`
  - `write_file`
  - `delegate_task`
- targeted exact-string edits plus whole-file replace/create and append writes
- initial and in-run image attachments for model vision-capable flows
- bounded public HTTP(S) fetches through `web_fetch`, with redirect validation,
  local/private target blocking, extracted text/links, and raw bounded response
  retention in tool-output artifacts; `web_fetch` is intentionally not part of
  the default parent tool surface
- bounded OpenAI-compatible prompt/tool loop
- interactive chat NDJSON stdio transport with opt-in SQLite session, turn, and
  event persistence
- single-child subagent delegation with fresh child prompt context, explicit
  default/browser/web_fetch/web_search child profiles, retained child run
  references, and a sanitized summary/state return channel
- optional schema-validated subagent structured returns, with raw child return
  artifacts retained in the child run and open-ended strings linkified before
  the parent sees them
- persisted run state, transcript, run reports, Loom run manifests, capability
  snapshots, and tool outputs, plus explicit skill registry and activation
  artifacts
- run-report gateway attempt diagnostics with chat-completion request size,
  timing, HTTP status, selected response headers/request IDs, and non-OK
  response body excerpts
- transcript-based resumability
- package-local operator CLI
- explicit Agent Skills preload via `--skills-root` and repeatable `--skill`
- runtime-generated supporting-resource indexes in `skill-registry.json`
- text-first supporting-resource reads through `read_skill_resource`, recorded
  in `skill-resource-reads.json`
- exact-allowlisted skill script execution through `run_skill_script`, recorded
  in `skill-script-executions.json`

The sandbox `bash` tool has a provisional direct-`curl` guard while sandbox
networking is enabled: explicit `curl` invocations may target loopback HTTP(S)
hosts such as `localhost`, `127.0.0.1`, and Docker Desktop's
`host.docker.internal` host alias, but obvious external `curl` targets are
denied before sandbox execution. This is an integration unblock, not a complete
network confinement model.

- CFC mode plumbing with:
  - `disabled`
  - `observe`
  - `enforce-explicit`
  - `enforce-strict`
- default CFC mode aligned with the runner's permissive-if-absent
  `enforce-explicit` rollout behavior
- spec-aligned `PromptSlotBound` prompt-slot evidence
- Loom run manifest intake through `--run-manifest`
- first-pass policy events and deny/recovery behavior
- configurable gateway auth mode:
  - `bearer`
  - `none`

What is not done yet:

- real runner-driven CFC feedback integration
- richer opaque-handle/pass-through behavior outside schema-validated subagent
  returns
- first-class browser operation policy on top of the provisional browser
  subagent profile
- dynamic/model-driven Agent Skills activation
- parallel child orchestration
- app UI event provenance
- streaming model responses
- richer mid-turn resumability

## Package Layout

- [src/config.ts](src/config.ts)
  - harness config, CFC mode resolution, gateway auth mode
- [src/engine.ts](src/engine.ts)
  - core execution engine, run state, tool execution
- [src/prompt-loop.ts](src/prompt-loop.ts)
  - bounded prompt/tool loop
- [src/cli.ts](src/cli.ts)
  - package-local operator CLI
- [src/interactive-chat-stdio.ts](src/interactive-chat-stdio.ts)
  - NDJSON stdio transport for the interactive chat protocol
- [src/sqlite-session-store.ts](src/sqlite-session-store.ts)
  - SQLite-backed interactive chat session, turn, and event persistence
- [src/artifacts.ts](src/artifacts.ts)
  - persisted run state, run manifest, transcript, run report, capability
    snapshot, and tool output storage
- [src/skills/](src/skills/)
  - Agent Skills registry scanning, validation, and explicit preload context
- [src/contracts/](src/contracts/)
  - prompt-slot, run-manifest, observation, policy, run-report, subagent, skill,
    transcript, and tool-result contracts
- [integration/](integration/)
  - environment-gated real `runsc-cfc` integration tests
- [docs/SKILLS_SUPPORT_SPEC.md](docs/SKILLS_SUPPORT_SPEC.md)
  - staged Agent Skills support design

## Commands

From [packages/cf-harness](.):

- `deno task help`
- `deno task run -- ...`
- `deno task test`
- `deno task test:integration`

## CLI Example

Standard bearer-auth mode:

```bash
cd packages/cf-harness
CF_HARNESS_API_KEY=... deno task run -- \
  --workspace ../.. \
  --prompt "Summarize the cf-harness package structure." \
  --print-transcript
```

No-auth gateway mode:

```bash
cd packages/cf-harness
deno task run -- \
  --workspace ../.. \
  --gateway-auth-mode none \
  --prompt "Summarize the cf-harness package structure." \
  --print-transcript
```

Local open-weight model via any OpenAI-compatible server (llama.cpp shown; LM
Studio, vLLM, and Ollama's `/v1` endpoint work the same way):

```bash
# Serve the model locally (downloads on first run, ~63GB):
llama-server -hf ggml-org/gpt-oss-120b-GGUF --ctx-size 0 --jinja --port 8080

cd packages/cf-harness
deno task run -- \
  --workspace ../.. \
  --gateway-base-url http://localhost:8080/ \
  --gateway-auth-mode none \
  --model gpt-oss-120b \
  --prompt "Summarize the cf-harness package structure." \
  --print-transcript
```

The gateway can also be selected via environment, which lets callers (loom,
pattern-factory) switch to a local model without threading new flags:

```bash
export CF_HARNESS_GATEWAY_BASE_URL=http://localhost:8080/
export CF_HARNESS_GATEWAY_AUTH_MODE=none
export CF_HARNESS_MODEL=gpt-oss-120b
```

CLI flags take precedence over these variables; `CF_HARNESS_MODEL` is ignored on
`--resume-run` (the resumed run keeps its recorded model unless `--model` is
passed explicitly).

On hosts without the `runsc-cfc` Docker runtime (or where the installed CFC
policy does not label the workspace mount, which makes in-sandbox file reads
fail with SIGSYS), run with the plain `runc` runtime and observe-mode CFC:

```bash
export CF_HARNESS_SANDBOX_DOCKER_RUNTIME=runc
export CF_HARNESS_CFC_ENFORCEMENT_MODE=observe
```

Tool outputs are then exposed raw with policy warnings recorded in the run
report instead of being denied for missing trusted mediation metadata.

Interactive chat stdio transport:

```bash
cd packages/cf-harness
deno run -A src/interactive-chat-stdio.ts \
  --chat-session-db /tmp/cf-harness-chat.sqlite
```

The stdio transport reads one interactive chat request envelope per line from
stdin and writes response/event envelopes as newline-delimited JSON. Pass
`--chat-session-db` or set `CF_HARNESS_CHAT_SESSION_DB` to persist sessions,
turn records, and replayable events across process restarts. Pass
`--chat-max-in-memory-events` or set `CF_HARNESS_CHAT_MAX_IN_MEMORY_EVENTS` to
bound the transport's in-memory event cache while keeping durable replay
available through SQLite.

Initial prompt image attachments:

```bash
cd packages/cf-harness
deno task run -- \
  --workspace /path/to/workspace \
  --gateway-auth-mode none \
  --image captures/example.png \
  --prompt "Describe the attached capture image and summarize useful next steps."
```

`--image` is repeatable and accepts `png`, `jpeg`, `gif`, and `webp` files
inside the workspace. Relative image paths are resolved from `--workspace`. The
transcript retains only image metadata (`hostPath`, media type, byte count,
digest); base64 pixels are materialized only for the gateway request.

Explicit skill preload:

```bash
deno task run -- \
  --workspace /path/to/common-fabric-2 \
  --cwd pattern-factory \
  --gateway-auth-mode none \
  --skills-root labs/skills \
  --skill pattern-dev \
  --skill pattern-implement \
  --prompt "Build this pattern."
```

Sandbox image override:

```bash
deno task run -- \
  --workspace /path/to/common-fabric-2 \
  --cwd pattern-factory \
  --gateway-auth-mode none \
  --sandbox-image registry.example/cf-harness-sandbox:deno2 \
  --prompt "Run deno task cf --help and report whether it works."
```

Use this for Deno 2 / Common Fabric CLI validation while keeping the mounted
workspace as the source of truth for Labs, Pattern Factory, and Loom code. Run
reports include the selected sandbox image in the capability snapshot.

Loom-backed batch runs may also pass a retained manifest:

```bash
deno task run -- \
  --workspace /path/to/workspace \
  --gateway-auth-mode none \
  --run-manifest /path/to/loom-run-manifest.json \
  --prompt "Handle this Loom wish."
```

Batch runs can require the agent to produce a schema-validated JSON sidecar
before the CLI exits successfully. `--result-json-path` remains the harness
metadata output; `--structured-result-path` is the agent-authored JSON file to
validate:

```bash
deno task run -- \
  --workspace /path/to/workspace \
  --gateway-auth-mode none \
  --output-mode batch \
  --result-json-path /tmp/cf-harness-result.json \
  --structured-result-path capture.results.json \
  --structured-result-schema-file /path/to/result.schema.json \
  --prompt "Write capture.results.json with the requested structured result."
```

The structured result path must stay inside the workspace. The schema may be
provided inline with `--structured-result-schema` or read from
`--structured-result-schema-file`. After the run, cf-harness reads the sidecar,
validates it with the same JSON Schema validation primitives used by subagent
`returnSchema`, records `structured_result` in the batch metadata, and exits
nonzero when the file is missing, invalid JSON, or schema-invalid.

When constraining the parent tool surface to `delegate_task`, authorize the
child profile separately so the delegation policy transition is explicit:

```bash
deno task run -- \
  --workspace /path/to/workspace \
  --gateway-auth-mode none \
  --allow-tool delegate_task \
  --allow-subagent-profile default \
  --prompt "Delegate a focused inspection and summarize the result."
```

The provisional browser profile is the only CLI-supported path to
`bash-no-sandbox`. It gives the child a host shell so it can invoke
`agent-browser`, while the parent still receives only the normal sanitized
subagent result. Browser/page output is treated as untrusted child-local data;
with a `returnSchema`, parent-visible free-form strings are replaced by opaque
links while raw observations stay in child artifacts. The browser child can read
workspace files but does not receive `edit_file` or `write_file`, so it should
return findings through the structured return channel rather than by writing
browser observations into the workspace.

When the parent run has a skill registry, the browser profile activates the
`agent-browser` skill in the child run, exposes `read_skill_resource`, and
allows these exact skill scripts through `run_skill_script`:
`agent-browser:scripts/form-automation.sh`,
`agent-browser:scripts/capture-workflow.sh`. Those browser-profile scripts run
through host execution because they need the host `agent-browser` CLI. They
still use the normal skill-script safeguards: activated skill, run-start
registry snapshot, exact script allowlist, digest/size match, and provenance
artifacts.

The host shell is policy-restricted to `agent-browser` attached through the
exact Loom Browser Access CDP endpoint supplied to the child task,
`agent-browser` discovery (`which agent-browser`, `command -v agent-browser`),
`pwd`, `ls`, and bounded workspace-local `find` commands. Page commands should
use the leased endpoint, for example
`agent-browser --cdp http://host.docker.internal:9362 snapshot -i`. Bare
`agent-browser open` / `snapshot` launches are denied so the child cannot race
the host's live browser profile. `agent-browser` is fail-closed to a small
positive allowlist: `open` for HTTP(S) URLs, `snapshot`, `get title/url/text`,
read-only `console` / `errors` inspection without mutation flags, bounded
`wait`, and ref-based `fill`, `type`, `select`, `check`, `click`, and `press`.

Host-target skill scripts run with a cleared subprocess environment plus a
controlled `PATH` and explicit `CF_HARNESS_*` / `SKILL_*` variables. They do not
inherit ambient provider tokens, developer secrets, app credentials, or other
parent process environment. Credential-bearing workflows such as
`agent-browser:scripts/authenticated-session.sh` are intentionally not in the
default browser-profile allowlist; adding them should go through an explicit
credential grant and origin-binding design.

For browser-profile runs, prefer a host artifact root outside the workspace. Raw
child artifacts are retained for operator analysis, but they are not meant to
become ordinary workspace inputs for the parent model. If an artifact root is
physically placed under the workspace, `read_file`, `view_image`, `write_file`,
and `edit_file`, plus browser-profile `ls`/`find`, treat that artifact tree as
reserved from model-facing file and discovery tools.

```bash
ROOT=/tmp/cf-harness-browser-demo
mkdir -p "$ROOT/workspace" "$ROOT/artifacts"

deno task run -- \
  --workspace "$ROOT/workspace" \
  --artifact-root "$ROOT/artifacts" \
  --gateway-auth-mode none \
  --allow-tool delegate_task \
  --allow-subagent-profile browser \
  --prompt "Delegate browser inspection of the local app and summarize the result."
```

The `web_fetch` profile is the preferred first-pass path for web page
inspection. It gives the child only the `web_fetch` tool: no shell, no browser,
no workspace reads, and no workspace writes. This keeps external web content in
the child run and returns only the normal sanitized subagent summary/state to
the parent. Use this profile when a task needs to inspect public HTTP(S) pages
but does not need authenticated browser state or general web search.

```bash
deno task run -- \
  --workspace /path/to/workspace \
  --gateway-auth-mode none \
  --allow-tool delegate_task \
  --allow-subagent-profile web_fetch \
  --prompt "Delegate inspection of https://example.com and summarize the result."
```

The `web_search` profile is the provider-native search profile. It runs the
child on the configured Gemini search model, requests the gateway's
`google_search` native model tool, and gives the child no built-in file, shell,
browser, or fetch tools. The intended use is the same CFC boundary as browser
and web_fetch subagents: the parent delegates a focused search task, raw search
observations stay in child artifacts, and the parent receives only the sanitized
subagent return channel.

Programmatic `delegate_task` calls may include `returnSchema`, a JSON Schema
object or boolean. In that mode the child is required to return a single JSON
value. The harness validates it, stores the raw child return under the child
artifact root, and exposes `subagent.structuredReturn.value` to the parent with
free-form strings and objects with unmodeled keys replaced by opaque `@link`
objects such as `opaque:<child-run-id>#/json/pointer`:

```json
{
  "goal": "Assess the briefing and return only the decision facts.",
  "returnSchema": {
    "type": "object",
    "properties": {
      "approved": { "type": "boolean" },
      "status": { "type": "string", "enum": ["approved", "not_approved"] },
      "summary": { "type": "string" }
    },
    "required": ["approved", "status", "summary"],
    "additionalProperties": false
  }
}
```

Current caveat:

- the default gateway target is still the stage endpoint at
  [https://llm.stage.commontools.dev/](https://llm.stage.commontools.dev/)
- gateway auth defaults remain an ergonomics question:
  - standalone `cf-harness` still defaults to `bearer`
  - Loom's `cf-harness` adapter defaults to `none`
- confirm the intended gateway/auth mode for the environment you are testing
  against
- skills support is explicit preload only for now:
  - `--skill` requires `--skills-root`
  - skill preload is not supported with `--resume-run`
  - dynamic `load_skill` activation is still planned

## Testing

Unit/package tests:

```bash
cd packages/cf-harness
deno task test
```

Environment-gated integration tests:

```bash
cd packages/cf-harness
deno task test:integration
```

The integration suite requires a working local Docker + `runsc-cfc` environment.
By default it also uses the published kitchen-sink image above, unless you
override `CF_HARNESS_INTEGRATION_IMAGE`.

To opt into a local Labs CLI smoke inside the sandbox, use a Deno 2-compatible
image and enable the CF CLI case:

```bash
cd packages/cf-harness
CF_HARNESS_INTEGRATION_IMAGE=registry.example/cf-harness-sandbox:deno2 \
CF_HARNESS_INTEGRATION_CF_CLI=1 \
deno task test:integration
```

That case mounts the current Labs checkout as `/workspace` and runs
`deno task cf --help` inside the `runsc-cfc` sandbox. It is skipped by default
because the published kitchen-sink image may not have the required Deno version
or cache state.

To also exercise a real host Fabric FUSE mount bind-mounted into the sandbox at
`/fabric`, start `cf fuse mount` separately and pass the mountpoint:

```bash
cd packages/cf-harness
CF_HARNESS_INTEGRATION_FABRIC_MOUNT=/tmp/cf deno task test:integration
```

That opt-in case verifies that cf-harness can navigate `/fabric` through
`runsc-cfc` and read the FUSE `.status` file. Without
`CF_HARNESS_INTEGRATION_FABRIC_MOUNT`, the Fabric mount case is skipped.

To exercise label flow through a live Fabric FUSE projection, enable the
additional CFC flow tests and provide concrete read/write projection paths under
`/fabric`:

```bash
# In another terminal, mount FUSE with Docker traversal enabled.
cf fuse mount /tmp/cf --allow-other --cfc-mode=observe --cfc-writeback-xattrs

cd packages/cf-harness
CF_HARNESS_RUNSC_CFC_RESULT_DIR="$HOME/.local/share/runsc-cfc/cfc-results" \
CF_HARNESS_RUNSC_CFC_INVOCATION_CONTEXT_DIR="$HOME/.local/share/runsc-cfc/cfc-invocations" \
CF_HARNESS_INTEGRATION_FABRIC_MOUNT=/tmp/cf \
CF_HARNESS_INTEGRATION_FABRIC_CFC_FLOW=1 \
CF_HARNESS_INTEGRATION_FABRIC_CFC_READ_PATH=/fabric/home/pieces/example/result/secret \
CF_HARNESS_INTEGRATION_FABRIC_CFC_WRITE_PATH=/fabric/home/pieces/example/result/output \
CF_HARNESS_INTEGRATION_FABRIC_CFC_LABEL_SUBJECT=did:key:fabric \
deno task test:integration
```

When those env vars point at a real labeled FUSE fixture, the extra tests probe
FUSE-to-sandbox taint, command completion after a FUSE read, FUSE write
attempts, and joins between explicit `cfcInputLabels` and a prior FUSE read. The
result sidecar env var is required for all CFC flow assertions, and the
invocation context sidecar env var is required for the cases that seed
`cfcInputLabels`. The installed Docker `runsc-cfc` runtime must also be
configured with the same `--cfc-invocation-context-dir`, otherwise those
invocation-label cases are skipped even if cf-harness writes sidecars.

The default Fabric CFC flow gate exercises the immediate result sidecar after a
FUSE read. The stricter host-bind readback probe is opt-in with
`CF_HARNESS_INTEGRATION_FABRIC_CFC_DURABLE_HOST_LABEL=1` because durable
`FUSE -> sandbox -> host -> sandbox` label persistence is still a live-stack
validation target. FUSE write assertions are also probes of the live stack:
durable cell-label writeback depends on the runner/runtime emitting FUSE
prepare/finalize metadata, not arbitrary direct writes to
`trusted.cfc.contentLabel`.

On Linux, Docker/runsc runs default to the host UID/GID. On macOS, the default
omits `--user` because Docker Desktop bind mounts may expose host files as
`root:root`, which prevents non-root container users from writing mounted Loom
workspaces. An explicit `containerUser` still overrides the platform default.

CFC sandbox result mediation requires the installed `runsc-cfc` runtime to use
the same host result directory that `cf-harness` reads. Configure runsc with
`--cfc-result-dir=/path/to/results`, then set
`CF_HARNESS_RUNSC_CFC_RESULT_DIR=/path/to/results` or pass `cfcResultDir` in the
explicit sandbox config.

CFC invocation context transport is similarly coordinated through a host sidecar
directory. Configure runsc with
`--cfc-invocation-context-dir=/path/to/invocations`, then set
`CF_HARNESS_RUNSC_CFC_INVOCATION_CONTEXT_DIR=/path/to/invocations` or pass
`cfcInvocationContextDir` in the explicit sandbox config. `cf-harness` writes
`<containerID>.json` after `docker create` and before `docker start`; the
payload contains audit/provenance context plus optional trusted `cfcInputLabels`
for supported startup inputs (`command`, `argv`, `args`, `env`, `cwd`, and
`stdin`). `stdin` labels are modeled as labels on the stdin source and taint
only after the sandbox reads or maps fd 0, not as automatic startup task taint.
When a trusted prompt-slot binding is present, `cf-harness` also derives
confidentiality-only prompt influence labels for model-authored invocation
inputs such as shell commands, structured file-tool arguments, and stdin
payloads. These labels are taint evidence, not integrity or authorization
claims. When CFC-mediated bash output is released to the model, `cf-harness`
records the observed output labels in run state and merges those confidentiality
labels into later model-authored invocation inputs. Opaque and denied outputs
are not added to this model-context accumulator.

The persisted model-context accumulator is sensitive retained run metadata. It
does not store raw stdout/stderr bytes, but its labels and observation refs can
still disclose which confidential sources influenced model-visible context.
Handle it under the same access and retention boundary as transcripts, tool
outputs, run state, and CFC policy traces.

On Docker Desktop for macOS, use the host path for `cf-harness` and the
`/host_mnt/...` projection for Docker's runtime args. The gVisor
`docker-desktop-cfc-setup` helper defaults to:

```bash
export CF_HARNESS_RUNSC_CFC_RESULT_DIR="$HOME/.local/share/runsc-cfc/cfc-results"
export CF_HARNESS_RUNSC_CFC_INVOCATION_CONTEXT_DIR="$HOME/.local/share/runsc-cfc/cfc-invocations"
```

## Related Docs

- [IMPLEMENTATION_PLAN.md](docs/IMPLEMENTATION_PLAN.md)
- [LOOM_MIGRATION_NOTES.md](docs/LOOM_MIGRATION_NOTES.md)
- [runner README](../runner/README.md)
- `specs/cfc/18-runtime-implementation-profiles.md` in the sibling `specs` repo