Agent Check

Progressive, provider-neutral testing plan runner for AI agents.

Agent Check exists for the testing gap that appears when AI agents are rapidly creating or changing applications.

Most traditional testing tools are built for repeatable retesting. They work well after a human or automation engineer has already encoded stable selectors, routes, assertions, and flows. That is valuable, but it is often too rigid for approval-style testing of freshly generated code, where the app may have just been created and exact details like selectors, accessibility labels, terminal states, or window controls may still be unknown.

Pure natural-language computer-use tools sit at the other end of the spectrum. They can explore from vague instructions, but they usually need an LLM call and fresh visual/context input at every step. That becomes slow and expensive very quickly, especially when many agents or many features are being tested in parallel.

Agent Check is a hybrid between those models. An agent can write a testing plan at the level it currently knows:

high-level semantic, task, intent, or visual candidates when the app is new
structural candidates when accessible roles, labels, text, or terminal regions are known
exact candidates when stable ids, selectors, commands, or automation ids are known

At runtime, the runner uses providers to execute deterministic parts directly and uses LLM inference only where semantic interpretation is needed. When a high-level candidate is resolved into a lower-level target, the run can produce refinements so future plans become cheaper, faster, and more deterministic.

The goal is to balance AI flexibility with testing-framework determinism: agents can test immediately after generating code without needing every exact detail up front, and they can progressively harden those tests as the application stabilizes.

Agent Check accepts a YAML Testing Plan, validates it, executes it through runtime providers, collects evidence, classifies failures, and writes run artifacts. It does not author plans. Plans can be written manually, by an agent, or by any external system.

What It Runs

A plan describes user-visible behavior:

what app surface to test, such as web, tui, desktop, or electron
what the user is trying to do
ordered flow steps
exact, structural, semantic, visual, and provider-hint candidates
assertions and failure policy

Concrete engines stay in providers. A testing plan should not mention Playwright, Appium, OpenAI, Selenium, or other implementation engines.

Install

From npm:

npm install -g @codebolt/agent-check

From this repository:

npm install
npm run build
node dist\cli\index.js doctor

Quick Start

Validate a plan:

agent-check validate examples\web-headed-exact-candidates.plan.yaml

List available providers:

agent-check providers --config agent-check.config.yaml

Run a headed Chrome web plan without LLM:

agent-check run examples\web-headed-exact-candidates.plan.yaml --config agent-check.config.yaml --no-llm --run-id web-exact

Run a headed Chrome web plan with LLM semantic resolution:

agent-check run examples\web-headed-semantic-llm.plan.yaml --config agent-check.config.yaml --model zhipu/glm-4.7 --run-id web-semantic

Artifacts are written to:

.agent-check/runs/<runId>/

Important files in every run:

result.json: final pass/fail status, failed step, failure class, artifacts
trace.json: provider execution trace
trace.jsonl: lightweight per-step status log
llm-trace.json: LLM candidate resolution, assertion judgement, and failure classification events
screenshots/snapshots returned by providers

Configuration

Runtime choices live outside the plan in agent-check.config.yaml.

artifactStore: .agent-check

execution:
  maxRecoveryAttemptsPerStep: 10

llm:
  enabled: true
  model: codex-cli/gpt-5.5

providers:
  mock:
    enabled: true
  webPlaywright:
    enabled: true
  tuiProcess:
    enabled: true
  electronPlaywright:
    enabled: true
  windowsDesktop:
    enabled: true

Model precedence is:

--model > AI_MODEL from .env/environment > agent-check.config.yaml

The CLI loads .env automatically. Do not commit secrets. For LLM backends, model prefixes, environment variables, and llm-trace.json, see docs/LLM.md.

Built-In Providers

Provider id	Surface	Notes
`mock`	`mock`	Deterministic provider for runner tests and examples.
`web-playwright`	`web`	Browser provider for websites. Supports headed/headless mode and `browserChannel: chrome`.
`tui-process`	`tui`	Spawns a command, sends text/keys, reads terminal output.
`electron-playwright`	`electron`	Launches Electron through Playwright and acts on renderer controls.
`windows-desktop`	`desktop`	Windows UI Automation provider for basic native controls.

Mobile candidate types exist in the schema, but a mobile provider is not implemented yet.

Provider details and custom-provider guidance:

docs/PROVIDERS.md

Included Examples

Web, headed Chrome:

agent-check run examples\web-headed-exact-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-structural-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-provider-hint-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-semantic-llm.plan.yaml --config agent-check.config.yaml --model zhipu/glm-4.7

Mock:

agent-check run examples\mock-pass.plan.yaml --no-llm
agent-check run examples\mock-fail.plan.yaml --no-llm

TUI:

agent-check run examples\tui-local.plan.yaml --config agent-check.config.yaml --no-llm

Documentation

Detailed guides:

Writing Plans

The short version:

specVersion: agent-check/v1
kind: TestingPlan

metadata:
  id: web-smoke
  title: User can create a project

target:
  appRef: local-web-fixture
  surface: web
  baseUrl: file:///D:/agentictest/examples/web-fixture.html
  headless: false
  browserChannel: chrome

intent:
  summary: Verify a user can create a project.
  acceptance:
    - The form opens.
    - The project name can be entered.
    - The created project message appears.

flow:
  - id: fill_project_name
    goal: Fill the project name field
    operation:
      type: input
      value: Example Project
      candidates:
        - semantic:
            instruction: Find the project name field.
    success:
      any:
        - semantic:
            intent: The project name was entered.

Development

npm install
npm run build
npm test
npm run check

Package preview:

npm pack --dry-run --json

Notes

Use --no-llm for deterministic exact/structural/providerHint examples.
Use llm-trace.json to verify that semantic plans actually called the LLM. See docs/LLM.md.
Provider errors, missing app sessions, browser launch failures, and LLM account failures should classify as environment.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
examples		examples
skills/write-testing-plans		skills/write-testing-plans
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent-check.config.yaml		agent-check.config.yaml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Check

What It Runs

Install

Quick Start

Configuration

Built-In Providers

Included Examples

Documentation

Writing Plans

Development

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Check

What It Runs

Install

Quick Start

Configuration

Built-In Providers

Included Examples

Documentation

Writing Plans

Development

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages