--- name: agent-browser description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. allowed-tools: Bash(agent-browser:*) --- Start with the shared manual testing guidance in: - `docs/common/ai/manual-testing-guide.md` Read that guide first. It is the canonical reference. # Browser Automation with agent-browser ## cf-harness browser profile When this skill is activated inside a `cf-harness` browser-profile subagent, the profile intentionally narrows the generic `agent-browser` capability surface described below. Use the leased CDP endpoint provided in the task, for example `agent-browser --cdp http://host.docker.internal:9362 snapshot -i`, and do not open or attach to any other browser endpoint. The cf-harness browser profile allows only a small set of page commands: `open` for HTTP(S) URLs, `snapshot`, `get title/url/text`, bounded `wait`, and ref-based `fill`, `type`, `select`, `check`, `click`, and `press`. Broader commands in this skill, including storage/cookie/session/HAR/network/file capture, profile/session setup, and auth workflows, may be unavailable in that profile. The default allowlisted skill scripts are limited to `scripts/form-automation.sh` and `scripts/capture-workflow.sh`; credentialed workflows such as `scripts/authenticated-session.sh` require a separate, explicit credential grant and origin-binding design. Under the cf-harness profile, the manual-testing-guide's screenshots, `--session` workflows, state save/load, console reading, and the Import-CLI-Key identity flow are unavailable. Substitute `snapshot` + `get text` for screenshots; skip multi-identity checks and record them as not-runnable in the report. ## Core Workflow Every browser automation follows this pattern: 1. Navigate: `agent-browser open ` 2. Snapshot: `agent-browser snapshot -i` 3. Interact: use refs to click, fill, select, or inspect 4. Re-snapshot after navigation or DOM change ```bash agent-browser open https://example.com/form agent-browser snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i ``` ## Essential Commands ```bash # Navigation agent-browser open agent-browser close # Snapshot agent-browser snapshot -i # Interactive elements with refs agent-browser snapshot -s "#selector" # Interaction agent-browser click @e1 agent-browser fill @e2 "text" agent-browser type @e2 "text" agent-browser select @e1 "option" agent-browser check @e1 agent-browser press Enter agent-browser scroll down 500 # Get information agent-browser get text @e1 agent-browser get url agent-browser get title # Wait agent-browser wait @e1 agent-browser wait --load networkidle agent-browser wait --url "**/page" agent-browser wait 2000 # Capture agent-browser screenshot agent-browser screenshot --full agent-browser pdf output.pdf ``` ## Common Patterns ### Form submission ```bash agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle ``` ### Authentication with state persistence ```bash agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json agent-browser state load auth.json agent-browser open https://app.example.com/dashboard ``` ### Common Fabric identity checks (host/interactive runs only) This recipe needs `--session`, `upload`, and `console`, which the cf-harness profile does not allow — run it only in host or interactive sessions. For Common Fabric tests that touch `PerUser`, `PerSession`, favorites, drafts, or home-space state, import the same CLI key used by `deno task cf` into the browser session via `Import CLI Key`. ```bash agent-browser --session cf-shared open http://localhost:8000// agent-browser --session cf-shared snapshot -i # Click Login, then Import CLI Key. agent-browser --session cf-shared upload @ "$CF_IDENTITY" agent-browser --session cf-shared click @ agent-browser --session cf-shared console ``` The browser console should include `[Identity] User DID: ...`; compare it with: ```bash deno run -A packages/cli/mod.ts id did "$CF_IDENTITY" ``` Use distinct `--session` names when comparing identities. A different identity should still see unscoped/`PerSpace` data in the same space, but `PerUser` and `PerSession` fields resolve to separate instances and may look empty/default. See `docs/development/SHARED_IDENTITY.md` for the full workflow. ### Data extraction ```bash agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 agent-browser get text body > page.txt agent-browser snapshot -i --json agent-browser get text @e1 --json ``` ### Parallel sessions ```bash agent-browser --session site1 open https://site-a.com agent-browser --session site2 open https://site-b.com agent-browser --session site1 snapshot -i agent-browser --session site2 snapshot -i agent-browser session list ``` ### Visual browser debugging ```bash agent-browser --headed open https://example.com agent-browser snapshot -i agent-browser highlight @e1 agent-browser record start demo.webm ``` ### Local files ```bash agent-browser open file:///path/to/document.pdf agent-browser open file:///path/to/page.html agent-browser screenshot output.png ``` ### Mobile-style workflows If your environment includes device emulation or a mobile browser harness, use the same open -> snapshot -> interact -> re-snapshot rhythm there. Treat those flows as provider-specific extensions rather than core `agent-browser` CLI commands unless your local install documents them explicitly. ## Ref Lifecycle Refs (`@e1`, `@e2`, and so on) are invalidated when the page changes. Always re-snapshot after: - clicking links or buttons that navigate - form submissions - dynamic content loading such as dropdowns or modals ```bash agent-browser click @e5 agent-browser snapshot -i agent-browser click @e1 ``` ## When Things Break - **CDP endpoint unreachable**: verify with `agent-browser get url`. If connection errors persist, record the failure in your notes/report and stop rather than retrying blindly. - **A `wait` that never resolves**: bound every wait with a `wait ` fallback. Prefer `wait "" --state hidden` or `wait --fn ""` for spinners and loading text. - **Element missing from snapshot**: `agent-browser scroll down 500` and re-snapshot; then `agent-browser snapshot -s ""`; then fall back to `find` semantic locators. - **Screenshot unavailable (restricted profiles)**: substitute `agent-browser snapshot -i` + `agent-browser get text` and record evidence textually. Verify each command against `agent-browser --help` (or `agent-browser --help`) before writing it. ## Semantic Locators When refs are unavailable or unreliable, use semantic locators: ```bash agent-browser find text "Sign In" click agent-browser find role button click --name "Submit" agent-browser find placeholder "Search" type "query" agent-browser find testid "submit-btn" click ``` > **Note:** `fill` requires a native `` or `