Skip to content

Latest commit

 

History

History
350 lines (263 loc) · 9.84 KB

File metadata and controls

350 lines (263 loc) · 9.84 KB

Architecture

Agent Check is a standalone TypeScript CLI and library for running provider-neutral testing plans. It receives a plan, validates it, executes it through runtime providers, records evidence, classifies failures, and optionally suggests refinements.

It is not a test authoring system. A plan can come from a human, another agent, or any external workflow.

Agent Check runtime architecture

Design Goals

  • Keep the testing plan durable and provider-neutral.
  • Put concrete automation engines behind provider plugins.
  • Use LLM inference only as runtime assistance, not as the source of truth.
  • Produce inspectable artifacts for every run.
  • Make failures explainable as app_bug, plan_unclear, or environment.
  • Let the CLI and TypeScript library share the same core implementation.

Package Layout

src/
  cli/        command-line entrypoint
  config/     runtime config parsing
  core/       runner, provider registry, artifacts, result model
  llm/        provider-neutral LLM interface and Vercel AI SDK adapter
  providers/  built-in runtime providers
  schema/     testing plan schema and YAML parsing
examples/     sample plans and fixtures
docs/         architecture, schema, LLM, and provider guides
skills/       agent skill for writing testing plans

Important files:

File Responsibility
src/cli/index.ts Implements validate, run, refine, providers, and doctor.
src/schema/testingPlan.ts Defines the Zod schema for testing plans.
src/config/runtimeConfig.ts Loads runtime config from agent-check.config.yaml.
src/core/runner.ts Orchestrates execution, verification, classification, and artifacts.
src/core/provider.ts Defines the provider plugin contract.
src/core/providerRegistry.ts Chooses compatible providers for steps.
src/core/artifacts.ts Writes run artifacts under .agent-check/runs/<runId>/.
src/llm/llmClient.ts Defines structured LLM tasks and result contracts.
src/llm/vercelAiLlmClient.ts Uses the Vercel AI SDK for model-provider abstraction.
src/providers/index.ts Registers built-in providers.

Runtime Flow

CLI/library call
  -> load runtime config
  -> parse and validate testing plan
  -> interpolate variables
  -> create artifact writer
  -> create provider registry
  -> create optional LLM client
  -> run flow steps in order
  -> write result.json, trace.json, trace.jsonl, llm-trace.json

Each flow step follows this lifecycle:

observe app state
  -> rank candidates
  -> resolve high-level candidates with LLM when needed
  -> ask providers canHandle()
  -> execute operation
  -> observe post-step state
  -> verify success
  -> collect artifacts and refinements
  -> continue or stop on failure

The runner stops at the first failed step. That keeps the failure focused and prevents later steps from producing misleading noise.

Testing Plan Boundary

The plan describes user-visible behavior:

  • app surface, such as web, tui, desktop, electron, or mock
  • intent and acceptance criteria
  • ordered flow steps
  • variables
  • candidate resolution levels
  • success conditions
  • failure and refinement policy

The plan must not name implementation engines such as Playwright, Appium, Selenium, OpenAI, or a specific terminal automation library.

Target fields may include provider-owned launch configuration, such as baseUrl, headless, browserChannel, or launchCommand. Those values describe how to reach the app during this run; they are not candidate semantics.

Runtime Config Boundary

agent-check.config.yaml controls local runtime policy:

artifactStore: .agent-check

execution:
  maxRecoveryAttemptsPerStep: 10

llm:
  enabled: true
  model: codex-cli/gpt-5.5

providers:
  mock:
    enabled: true
  webPlaywright:
    enabled: true

This file can vary by machine, CI job, or workspace. The plan should remain portable across those environments.

Provider Layer

Providers translate abstract operations into concrete runtime actions.

export interface ProviderPlugin {
  id: string;
  name: string;
  capabilities(): readonly ProviderCapability[];
  observe(input: ProviderHandleInput): Promise<AppObservation>;
  canHandle(input: ProviderHandleInput): Promise<ProviderHandleAssessment>;
  execute(input: ProviderHandleInput): Promise<ProviderExecutionResult>;
  verify(input: ProviderHandleInput): Promise<ProviderVerificationResult>;
  dispose?(): Promise<void>;
}

Provider responsibilities:

  • create or attach to the app surface
  • observe visible state
  • determine if a candidate can be handled
  • execute operations
  • verify assertions that can be checked deterministically
  • return screenshots, snapshots, logs, or other evidence
  • clean up runtime resources

The runner owns sequencing and classification. Providers own surface-specific mechanics.

Provider Registry

The provider registry filters providers by:

  • target surface
  • provider enabled state
  • provider canHandle() confidence
  • operation and candidate compatibility

The registry lets multiple providers exist for the same surface. For example, a future web-cdp provider could sit beside web-playwright, and the runner can choose based on canHandle() scoring.

Candidate Resolution

Candidates are ordered by runtime.resolutionOrder.

For the detailed candidate model, authoring guidance, and examples by level, see CANDIDATES.md.

Common order:

runtime:
  resolutionOrder:
    - exact
    - structural
    - semantic
    - task
    - intent
    - visual
    - providerHint

Exact and structural candidates can usually go straight to a provider.

High-level candidates such as semantic, task, intent, and visual are resolved through the LLM when enabled. The LLM result is still provider-neutral, for example:

exact:
  stableId: project-name-input

or:

structural:
  role: button
  name: Create Project

The provider then translates that candidate into actual browser, terminal, desktop, or Electron actions.

LLM Layer

The LLM layer is behind LlmClient. The current implementation uses the Vercel AI SDK adapter, which allows model-provider selection by model string.

The runner calls the LLM for:

  • candidate resolution
  • semantic and visual assertion judgement
  • failure classification
  • refinement suggestions

LLM details live in LLM.md.

Artifacts

Runs write to:

.agent-check/runs/<runId>/

Standard files:

File Contents
result.json Final result, failed step, failure class, evidence paths, refinements.
trace.json Structured provider and runner trace.
trace.jsonl Lightweight chronological event log.
llm-trace.json LLM calls, results, confidence, and errors.

Providers may add screenshots, DOM snapshots, terminal output, desktop window snapshots, or engine traces.

Result Model

A run produces RunResult.

Key fields:

Field Meaning
status passed or failed.
failedStepId First failed step, when any.
failureClass app_bug, plan_unclear, or environment.
evidence Artifact paths relevant to the run.
trace Provider and runner trace entries.
refinements Suggested plan improvements.

Failure classes are intentionally small:

  • app_bug: user-visible behavior failed.
  • plan_unclear: the plan did not identify the target or outcome clearly.
  • environment: runtime setup, provider, app launch, browser, terminal, desktop session, or LLM backend failed.

Refinement Flow

Providers and the LLM can suggest refinements. Examples:

  • add an exact stable id candidate discovered during a run
  • replace vague wording with a clearer semantic instruction
  • add a structural role/name candidate
  • clarify an ambiguous assertion

The runner writes refinements into result.json. It does not mutate the plan during run.

Plan mutation is explicit:

agent-check refine <runId> --plan path\to\plan.yaml --apply

CLI Commands

Command Purpose
validate <plan.yaml> Parse and validate a testing plan.
run <plan.yaml> Execute a plan and write run artifacts.
refine <runId> Inspect or apply stored refinement suggestions.
providers List registered provider capabilities.
doctor Inspect config, environment, and runtime readiness.

The CLI is a thin layer over the library. Library users can import the parser, runner, providers, and config helpers directly.

Built-In Providers

Current built-ins:

Provider id Surface Role
mock mock Deterministic fixture provider for tests and examples.
web-playwright web Browser provider for websites, including headed Chrome.
tui-process tui Process-backed terminal provider.
electron-playwright electron Electron renderer provider.
windows-desktop desktop Initial Windows UI Automation provider.

Provider details live in PROVIDERS.md.

Testing Strategy

The test suite covers:

  • schema validation
  • banned provider-specific engine names inside plans
  • mixed candidate levels
  • passing and failing mock runs
  • stop-on-failure behavior
  • provider launch/setup failures
  • failure classification
  • LLM candidate resolution
  • LLM assertion judgement
  • controlled handling of invalid or low-confidence LLM output

Run:

npm test

Extension Points

Good extension points:

  • add a new provider in src/providers
  • implement a custom provider through the library API
  • add new provider-neutral candidate fields to the schema
  • add new artifact kinds
  • add another Vercel AI SDK model prefix
  • add richer refinement application rules

Avoid:

  • putting concrete engine names into plans
  • making providers author plans
  • treating LLM output as unverified execution success
  • mutating plans automatically during run