Architecture

Agent Check is a standalone TypeScript CLI and library for running provider-neutral testing plans. It receives a plan, validates it, executes it through runtime providers, records evidence, classifies failures, and optionally suggests refinements.

It is not a test authoring system. A plan can come from a human, another agent, or any external workflow.

Design Goals

Keep the testing plan durable and provider-neutral.
Put concrete automation engines behind provider plugins.
Use LLM inference only as runtime assistance, not as the source of truth.
Produce inspectable artifacts for every run.
Make failures explainable as app_bug, plan_unclear, or environment.
Let the CLI and TypeScript library share the same core implementation.

Package Layout

src/
  cli/        command-line entrypoint
  config/     runtime config parsing
  core/       runner, provider registry, artifacts, result model
  llm/        provider-neutral LLM interface and Vercel AI SDK adapter
  providers/  built-in runtime providers
  schema/     testing plan schema and YAML parsing
examples/     sample plans and fixtures
docs/         architecture, schema, LLM, and provider guides
skills/       agent skill for writing testing plans

Important files:

File	Responsibility
`src/cli/index.ts`	Implements `validate`, `run`, `refine`, `providers`, and `doctor`.
`src/schema/testingPlan.ts`	Defines the Zod schema for testing plans.
`src/config/runtimeConfig.ts`	Loads runtime config from `agent-check.config.yaml`.
`src/core/runner.ts`	Orchestrates execution, verification, classification, and artifacts.
`src/core/provider.ts`	Defines the provider plugin contract.
`src/core/providerRegistry.ts`	Chooses compatible providers for steps.
`src/core/artifacts.ts`	Writes run artifacts under `.agent-check/runs/<runId>/`.
`src/llm/llmClient.ts`	Defines structured LLM tasks and result contracts.
`src/llm/vercelAiLlmClient.ts`	Uses the Vercel AI SDK for model-provider abstraction.
`src/providers/index.ts`	Registers built-in providers.

Runtime Flow

CLI/library call
  -> load runtime config
  -> parse and validate testing plan
  -> interpolate variables
  -> create artifact writer
  -> create provider registry
  -> create optional LLM client
  -> run flow steps in order
  -> write result.json, trace.json, trace.jsonl, llm-trace.json

Each flow step follows this lifecycle:

observe app state
  -> rank candidates
  -> resolve high-level candidates with LLM when needed
  -> ask providers canHandle()
  -> execute operation
  -> observe post-step state
  -> verify success
  -> collect artifacts and refinements
  -> continue or stop on failure

The runner stops at the first failed step. That keeps the failure focused and prevents later steps from producing misleading noise.

Testing Plan Boundary

The plan describes user-visible behavior:

app surface, such as web, tui, desktop, electron, or mock
intent and acceptance criteria
ordered flow steps
variables
candidate resolution levels
success conditions
failure and refinement policy

The plan must not name implementation engines such as Playwright, Appium, Selenium, OpenAI, or a specific terminal automation library.

Target fields may include provider-owned launch configuration, such as baseUrl, headless, browserChannel, or launchCommand. Those values describe how to reach the app during this run; they are not candidate semantics.

Runtime Config Boundary

agent-check.config.yaml controls local runtime policy:

artifactStore: .agent-check

execution:
  maxRecoveryAttemptsPerStep: 10

llm:
  enabled: true
  model: codex-cli/gpt-5.5

providers:
  mock:
    enabled: true
  webPlaywright:
    enabled: true

This file can vary by machine, CI job, or workspace. The plan should remain portable across those environments.

Provider Layer

Providers translate abstract operations into concrete runtime actions.

export interface ProviderPlugin {
  id: string;
  name: string;
  capabilities(): readonly ProviderCapability[];
  observe(input: ProviderHandleInput): Promise<AppObservation>;
  canHandle(input: ProviderHandleInput): Promise<ProviderHandleAssessment>;
  execute(input: ProviderHandleInput): Promise<ProviderExecutionResult>;
  verify(input: ProviderHandleInput): Promise<ProviderVerificationResult>;
  dispose?(): Promise<void>;
}

Provider responsibilities:

create or attach to the app surface
observe visible state
determine if a candidate can be handled
execute operations
verify assertions that can be checked deterministically
return screenshots, snapshots, logs, or other evidence
clean up runtime resources

The runner owns sequencing and classification. Providers own surface-specific mechanics.

Provider Registry

The provider registry filters providers by:

target surface
provider enabled state
provider canHandle() confidence
operation and candidate compatibility

The registry lets multiple providers exist for the same surface. For example, a future web-cdp provider could sit beside web-playwright, and the runner can choose based on canHandle() scoring.

Candidate Resolution

Candidates are ordered by runtime.resolutionOrder.

For the detailed candidate model, authoring guidance, and examples by level, see CANDIDATES.md.

Common order:

runtime:
  resolutionOrder:
    - exact
    - structural
    - semantic
    - task
    - intent
    - visual
    - providerHint

Exact and structural candidates can usually go straight to a provider.

High-level candidates such as semantic, task, intent, and visual are resolved through the LLM when enabled. The LLM result is still provider-neutral, for example:

exact:
  stableId: project-name-input

or:

structural:
  role: button
  name: Create Project

The provider then translates that candidate into actual browser, terminal, desktop, or Electron actions.

LLM Layer

The LLM layer is behind LlmClient. The current implementation uses the Vercel AI SDK adapter, which allows model-provider selection by model string.

The runner calls the LLM for:

candidate resolution
semantic and visual assertion judgement
failure classification
refinement suggestions

LLM details live in LLM.md.

Artifacts

Runs write to:

.agent-check/runs/<runId>/

Standard files:

File	Contents
`result.json`	Final result, failed step, failure class, evidence paths, refinements.
`trace.json`	Structured provider and runner trace.
`trace.jsonl`	Lightweight chronological event log.
`llm-trace.json`	LLM calls, results, confidence, and errors.

Providers may add screenshots, DOM snapshots, terminal output, desktop window snapshots, or engine traces.

Result Model

A run produces RunResult.

Key fields:

Field	Meaning
`status`	`passed` or `failed`.
`failedStepId`	First failed step, when any.
`failureClass`	`app_bug`, `plan_unclear`, or `environment`.
`evidence`	Artifact paths relevant to the run.
`trace`	Provider and runner trace entries.
`refinements`	Suggested plan improvements.

Failure classes are intentionally small:

app_bug: user-visible behavior failed.
plan_unclear: the plan did not identify the target or outcome clearly.
environment: runtime setup, provider, app launch, browser, terminal, desktop session, or LLM backend failed.

Refinement Flow

Providers and the LLM can suggest refinements. Examples:

add an exact stable id candidate discovered during a run
replace vague wording with a clearer semantic instruction
add a structural role/name candidate
clarify an ambiguous assertion

The runner writes refinements into result.json. It does not mutate the plan during run.

Plan mutation is explicit:

agent-check refine <runId> --plan path\to\plan.yaml --apply

CLI Commands

Command	Purpose
`validate <plan.yaml>`	Parse and validate a testing plan.
`run <plan.yaml>`	Execute a plan and write run artifacts.
`refine <runId>`	Inspect or apply stored refinement suggestions.
`providers`	List registered provider capabilities.
`doctor`	Inspect config, environment, and runtime readiness.

The CLI is a thin layer over the library. Library users can import the parser, runner, providers, and config helpers directly.

Built-In Providers

Current built-ins:

Provider id	Surface	Role
`mock`	`mock`	Deterministic fixture provider for tests and examples.
`web-playwright`	`web`	Browser provider for websites, including headed Chrome.
`tui-process`	`tui`	Process-backed terminal provider.
`electron-playwright`	`electron`	Electron renderer provider.
`windows-desktop`	`desktop`	Initial Windows UI Automation provider.

Provider details live in PROVIDERS.md.

Testing Strategy

The test suite covers:

schema validation
banned provider-specific engine names inside plans
mixed candidate levels
passing and failing mock runs
stop-on-failure behavior
provider launch/setup failures
failure classification
LLM candidate resolution
LLM assertion judgement
controlled handling of invalid or low-confidence LLM output

Run:

npm test

Extension Points

Good extension points:

add a new provider in src/providers
implement a custom provider through the library API
add new provider-neutral candidate fields to the schema
add new artifact kinds
add another Vercel AI SDK model prefix
add richer refinement application rules

Avoid:

putting concrete engine names into plans
making providers author plans
treating LLM output as unverified execution success
mutating plans automatically during run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Design Goals

Package Layout

Runtime Flow

Testing Plan Boundary

Runtime Config Boundary

Provider Layer

Provider Registry

Candidate Resolution

LLM Layer

Artifacts

Result Model

Refinement Flow

CLI Commands

Built-In Providers

Testing Strategy

Extension Points

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture

Design Goals

Package Layout

Runtime Flow

Testing Plan Boundary

Runtime Config Boundary

Provider Layer

Provider Registry

Candidate Resolution

LLM Layer

Artifacts

Result Model

Refinement Flow

CLI Commands

Built-In Providers

Testing Strategy

Extension Points