Skip to content

codeboltai/agent-check

Repository files navigation

Agent Check

Progressive, provider-neutral testing plan runner for AI agents.

Agent Check testing spectrum

Agent Check exists for the testing gap that appears when AI agents are rapidly creating or changing applications.

Most traditional testing tools are built for repeatable retesting. They work well after a human or automation engineer has already encoded stable selectors, routes, assertions, and flows. That is valuable, but it is often too rigid for approval-style testing of freshly generated code, where the app may have just been created and exact details like selectors, accessibility labels, terminal states, or window controls may still be unknown.

Pure natural-language computer-use tools sit at the other end of the spectrum. They can explore from vague instructions, but they usually need an LLM call and fresh visual/context input at every step. That becomes slow and expensive very quickly, especially when many agents or many features are being tested in parallel.

Agent Check is a hybrid between those models. An agent can write a testing plan at the level it currently knows:

  • high-level semantic, task, intent, or visual candidates when the app is new
  • structural candidates when accessible roles, labels, text, or terminal regions are known
  • exact candidates when stable ids, selectors, commands, or automation ids are known

At runtime, the runner uses providers to execute deterministic parts directly and uses LLM inference only where semantic interpretation is needed. When a high-level candidate is resolved into a lower-level target, the run can produce refinements so future plans become cheaper, faster, and more deterministic.

The goal is to balance AI flexibility with testing-framework determinism: agents can test immediately after generating code without needing every exact detail up front, and they can progressively harden those tests as the application stabilizes.

Agent Check progressive hardening loop

Agent Check accepts a YAML Testing Plan, validates it, executes it through runtime providers, collects evidence, classifies failures, and writes run artifacts. It does not author plans. Plans can be written manually, by an agent, or by any external system.

What It Runs

A plan describes user-visible behavior:

  • what app surface to test, such as web, tui, desktop, or electron
  • what the user is trying to do
  • ordered flow steps
  • exact, structural, semantic, visual, and provider-hint candidates
  • assertions and failure policy

Concrete engines stay in providers. A testing plan should not mention Playwright, Appium, OpenAI, Selenium, or other implementation engines.

Install

From npm:

npm install -g @codebolt/agent-check

From this repository:

npm install
npm run build
node dist\cli\index.js doctor

Quick Start

Validate a plan:

agent-check validate examples\web-headed-exact-candidates.plan.yaml

List available providers:

agent-check providers --config agent-check.config.yaml

Run a headed Chrome web plan without LLM:

agent-check run examples\web-headed-exact-candidates.plan.yaml --config agent-check.config.yaml --no-llm --run-id web-exact

Run a headed Chrome web plan with LLM semantic resolution:

agent-check run examples\web-headed-semantic-llm.plan.yaml --config agent-check.config.yaml --model zhipu/glm-4.7 --run-id web-semantic

Artifacts are written to:

.agent-check/runs/<runId>/

Important files in every run:

  • result.json: final pass/fail status, failed step, failure class, artifacts
  • trace.json: provider execution trace
  • trace.jsonl: lightweight per-step status log
  • llm-trace.json: LLM candidate resolution, assertion judgement, and failure classification events
  • screenshots/snapshots returned by providers

Configuration

Runtime choices live outside the plan in agent-check.config.yaml.

artifactStore: .agent-check

execution:
  maxRecoveryAttemptsPerStep: 10

llm:
  enabled: true
  model: codex-cli/gpt-5.5

providers:
  mock:
    enabled: true
  webPlaywright:
    enabled: true
  tuiProcess:
    enabled: true
  electronPlaywright:
    enabled: true
  windowsDesktop:
    enabled: true

Model precedence is:

--model > AI_MODEL from .env/environment > agent-check.config.yaml

The CLI loads .env automatically. Do not commit secrets. For LLM backends, model prefixes, environment variables, and llm-trace.json, see docs/LLM.md.

Built-In Providers

Provider id Surface Notes
mock mock Deterministic provider for runner tests and examples.
web-playwright web Browser provider for websites. Supports headed/headless mode and browserChannel: chrome.
tui-process tui Spawns a command, sends text/keys, reads terminal output.
electron-playwright electron Launches Electron through Playwright and acts on renderer controls.
windows-desktop desktop Windows UI Automation provider for basic native controls.

Mobile candidate types exist in the schema, but a mobile provider is not implemented yet.

Provider details and custom-provider guidance:

Included Examples

Web, headed Chrome:

agent-check run examples\web-headed-exact-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-structural-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-provider-hint-candidates.plan.yaml --config agent-check.config.yaml --no-llm
agent-check run examples\web-headed-semantic-llm.plan.yaml --config agent-check.config.yaml --model zhipu/glm-4.7

Mock:

agent-check run examples\mock-pass.plan.yaml --no-llm
agent-check run examples\mock-fail.plan.yaml --no-llm

TUI:

agent-check run examples\tui-local.plan.yaml --config agent-check.config.yaml --no-llm

Documentation

Detailed guides:

Writing Plans

The short version:

specVersion: agent-check/v1
kind: TestingPlan

metadata:
  id: web-smoke
  title: User can create a project

target:
  appRef: local-web-fixture
  surface: web
  baseUrl: file:///D:/agentictest/examples/web-fixture.html
  headless: false
  browserChannel: chrome

intent:
  summary: Verify a user can create a project.
  acceptance:
    - The form opens.
    - The project name can be entered.
    - The created project message appears.

flow:
  - id: fill_project_name
    goal: Fill the project name field
    operation:
      type: input
      value: Example Project
      candidates:
        - semantic:
            instruction: Find the project name field.
    success:
      any:
        - semantic:
            intent: The project name was entered.

Development

npm install
npm run build
npm test
npm run check

Package preview:

npm pack --dry-run --json

Notes

  • Use --no-llm for deterministic exact/structural/providerHint examples.
  • Use llm-trace.json to verify that semantic plans actually called the LLM. See docs/LLM.md.
  • Provider errors, missing app sessions, browser launch failures, and LLM account failures should classify as environment.

About

A hybrid testing runner for AI agents: write high-level plans for fresh code, execute deterministic steps directly, and use LLMs only where runtime interpretation is needed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors