# Code coverage in CI This repository measures code coverage in two different ways, because it runs two different kinds of code. Keeping the two apart is the key to reading the numbers correctly and to deciding which test job should collect which kind of coverage. ## Two kinds of code, two coverage mechanisms ### Runtime and framework code is measured by Deno's V8 coverage The packages that make up the Common Fabric runtime (api, runner, identity, memory, and the rest) are ordinary TypeScript modules. Deno loads and runs them directly, so Deno's built-in V8 coverage can record which of their lines ran. A CI job turns this on by setting the `DENO_COVERAGE_DIR` environment variable. After the tests finish, `tasks/write-coverage-lcov.ts` converts the raw V8 profile into an LCOV file, and the job uploads it as a `coverage-profile-*` artifact. Most test jobs set `DENO_COVERAGE_DIR`, including both pattern integration jobs. ### Authored pattern code is measured by transformer instrumentation Patterns (the user programs under `packages/patterns`) are not loaded the way an ordinary module is. Each pattern is compiled through the Common Fabric transformer pipeline and then run inside a sandbox. Deno's V8 coverage never sees the authored pattern statements execute, so it cannot report which lines of a pattern ran. To measure that, the `cf test` command can turn on a separate mechanism. When the `CF_PATTERN_COVERAGE_DIR` environment variable is set (or the `--pattern-coverage-dir` flag is passed), the `cf test` runner builds a `PatternCoverageCollector`. The transformer then injects a coverage "hit" call in front of each authored statement, and the collector writes one `*.pattern-coverage.lcov` file per test. The line numbers in that file point back at the authored pattern source. Two properties of this mechanism are worth keeping in mind. - The counters are statement based. A single statement that spans several lines marks its whole source range as run the moment the statement is reached. The number answers "did this statement run", not "was every line independently exercised". - Coverage records that a line ran. It never records that a test checked the result of running that line. A test that drives a pattern through a flow without asserting anything still marks those lines covered. `CF_PATTERN_COVERAGE_DIR` is read in exactly one place: the `cf test` command in `packages/cli/commands/test.ts`. A job that runs patterns through a plain `deno test` invocation, or by talking to a running Toolshed server, does not go through `cf test`. Setting the variable on such a job has no effect at all. ## How the two feed the coverage gate `tasks/perf-check.ts` downloads every `coverage-profile-*` artifact, joins all the LCOV files together, and hands them to `tasks/coverage-metrics.ts`. That code walks the tracked source files under `packages`, `tasks`, and `scripts`, and for each file counts how many of its lines no test covered. The counts roll up into `coverage-debt: uncovered lines` metrics, for example `coverage-debt: packages/patterns uncovered lines`, and the performance check gates a pull request on them. Authored pattern files under `packages/patterns` are tracked source files, so their uncovered lines count toward `coverage-debt: packages/patterns`. The only job that currently feeds covered-line data for those authored files is `pattern-unit-test`, because it is the only job that runs patterns through `cf test` with `CF_PATTERN_COVERAGE_DIR` set. One detail of the gate's accounting matters when reasoning about pattern coverage. For a tracked file that has any LCOV record, the gate counts only the lines named in that record that were never hit. For a tracked file with no LCOV record at all, every tracked line counts as uncovered. Pattern instrumentation emits a record line only for statements it could instrument, which is a subset of the file's lines. So the first time any test produces a pattern-coverage record for a previously unrecorded pattern file, that file's uncovered-line count drops both because real lines became covered and because the lines the instrumentation never names leave the denominator. The second effect is an accounting artifact, not new test strength. ## A combined report for IDEs The same `coverage-profile-*` artifacts feed a second consumer. On `main`, the `attest-binaries` job downloads all of them, runs `tasks/combine-coverage-lcov.ts` to merge them into one LCOV file, and uploads that file to the build-artifacts bucket next to the release tarball. The point is to give someone working in an IDE a single file that shows coverage for the whole repository, instead of one fragment per CI job. Two things happen during the merge. The source paths in each fragment are absolute paths rooted at whichever runner produced them, so they are rewritten to repository-relative paths that an IDE can map onto a local checkout. Records for the same source file are then combined into one, with the per-line hit counts added together, so a file exercised by several jobs is reported once with its combined coverage. The merged file carries line coverage only. LCOV identifies a function by its name, and `deno coverage --lcov` can emit several functions with the same name in one file (a free function and a method, for example), so function and branch records cannot be merged back together reliably from the fragments alone. Line coverage is what an IDE uses to colour the gutter, which is what this file is for. To download the report for a given commit: ``` gsutil cp gs://commontools-build-artifacts/workspace-artifacts/labs-.lcov . ``` ## Which job collects which coverage | Job | Runtime (V8) coverage | Authored-pattern coverage | | --- | --- | --- | | `pattern-unit-test` | yes | yes (`cf test` with `CF_PATTERN_COVERAGE_DIR`) | | `pattern-integration-test` | yes | no | | `pattern-reload-integration-test` | yes | no | The pattern unit job runs each `packages/patterns/**/*.test.tsx` file through `cf test` in-process. The two integration jobs run browser-driven `deno test` files against a running Toolshed server. ## Why the two integration jobs do not set `CF_PATTERN_COVERAGE_DIR` Adding `CF_PATTERN_COVERAGE_DIR` to `pattern-integration-test` or `pattern-reload-integration-test` was considered and deliberately not done, for four reasons. 1. It would do nothing as written. Both jobs run their tests with `deno test`, not `cf test`, and `CF_PATTERN_COVERAGE_DIR` is read only by `cf test`. The variable would sit in the job's environment unread, which is worse than absent because it suggests coverage is being collected when none is. 2. Making it work would mean crossing a process boundary. These tests run the pattern inside a sandbox in a headless browser that talks to a separate Toolshed server. The "hit" calls happen in that sandbox, with no in-process collector to receive them and write LCOV. Collecting them would require new plumbing to carry hit data back across the browser and server boundaries. 3. It would weaken the pressure the gate puts on test quality. The coverage-debt gate currently rewards focused pattern unit tests, which run in-process, assert on results, and are fast. If broad end-to-end flows counted toward pattern coverage, the gate could be satisfied by incidental execution in an integration test that asserts little, so coverage debt would fall without the matching rise in verification strength. 4. It would distort the gate's numbers through the accounting artifact described above. Crediting integration runs would flip many pattern files from the full-file denominator to the narrower instrumented-statement denominator, so reported debt would drop for reasons unrelated to new testing. The net effect on metric quality is that crediting integration runs would trade a smaller, bounded source of false negatives for a larger source of false positives and a less faithful, more gameable gate. ## Known limitations and possible future work - False negatives exist today. A pattern line that only an integration test exercises is counted as uncovered, because no integration job collects pattern coverage. This understates true coverage for the patterns that are reachable only through end-to-end flows. The trade is deliberate: the alternative above inflates the numbers more than this understates them. - The record-versus-no-record denominator difference is a pre-existing property of the gate, not specific to integration jobs. If pattern coverage is ever expanded, normalizing the denominator (always scoring a pattern file against the same line set whether or not it has a record) would make the numbers easier to trust. ## Related documentation - [TESTING.md](TESTING.md) — how to run the test suites whose execution this coverage is measured from. - [CI_PERFORMANCE.md](CI_PERFORMANCE.md) — the coverage-debt baseline and ratchet markers (`NEW_PERF_BASELINE` and `NEW_COVERAGE_BASELINE`) that gate a pull request on the metrics described here. - [../common/workflows/pattern-testing.md](../common/workflows/pattern-testing.md) — writing the pattern unit tests that the `pattern-unit-test` job runs through `cf test`, which is the only source of authored-pattern coverage.