LLM Inference

Agent Check uses LLM inference as a runtime assistant for executing an existing testing plan. It does not use the LLM to author the plan.

The durable artifact remains the provider-neutral YAML plan. LLM calls happen during a run when the runner needs semantic interpretation, judgement, classification, or refinement suggestions.

Where It Fits

LLM inference is used for four runtime tasks:

Task	When it happens	Output
Candidate resolution	A candidate is `semantic`, `task`, `intent`, or `visual`.	A provider-neutral executable candidate, such as `exact` or `structural`.
Assertion judgement	Provider verification cannot deterministically decide a semantic or visual condition.	A pass/fail judgement with confidence and reason.
Failure classification	A step fails and needs a final class.	`app_bug`, `plan_unclear`, or `environment`.
Refinement suggestions	A run discovers better anchors or clearer plan wording.	Provider-neutral refinements stored in run artifacts.

The provider still performs the actual app work. The LLM does not call Playwright, terminal APIs, desktop automation APIs, or app-specific code.

Model Configuration

Runtime config lives outside the plan in agent-check.config.yaml:

llm:
  enabled: true
  model: codex-cli/gpt-5.5

Model precedence is:

--model > AI_MODEL from .env/environment > agent-check.config.yaml

Disable LLM calls for deterministic runs:

agent-check run examples\web-headed-exact-candidates.plan.yaml --no-llm

Run with an explicit model:

agent-check run examples\web-headed-semantic-llm.plan.yaml --config agent-check.config.yaml --model zhipu/glm-4.7 --run-id web-semantic-zhipu

`.env`

The CLI imports dotenv/config, so a .env file in the working directory is loaded automatically.

Example:

AI_MODEL=zhipu/glm-4.7
AI_LLM_TIMEOUT_MS=60000
ZHIPU_API_KEY=...

Do not commit .env or API keys.

Supported Prefixes

Agent Check currently routes model strings by prefix:

Prefix	Backend	Notes
`codex-cli/`	Codex CLI community provider	Uses the locally configured Codex CLI account.
`zhipu/`	Zhipu AI / BigModel	Uses `ZHIPU_API_KEY`.
`zai/`	Z.AI endpoint	Uses `ZHIPU_API_KEY` and `https://api.z.ai/api/paas/v4`.

Examples:

agent-check run examples\web-headed-semantic-llm.plan.yaml --model codex-cli/gpt-5.5
agent-check run examples\web-headed-semantic-llm.plan.yaml --model zhipu/glm-4.7
agent-check run examples\web-headed-semantic-llm.plan.yaml --model zai/glm-4.7

Unsupported prefixes fail as controlled LLM provider errors and are classified as environment when they block a run.

Environment Variables

Common variables:

Variable	Purpose
`AI_MODEL`	Default model when `--model` is not passed.
`AI_LLM_TIMEOUT_MS`	Timeout for each LLM request.
`ZHIPU_API_KEY`	API key for `zhipu/` and `zai/` models.
`ZHIPU_BASE_URL`	Optional override for the Zhipu provider base URL.

Codex CLI provider variables:

Variable	Purpose
`CODEX_CLI_PATH`	Optional path to the Codex CLI executable.
`CODEX_CLI_REASONING_EFFORT`	Reasoning effort passed to the provider.
`CODEX_CLI_APPROVAL_MODE`	Approval mode passed to the provider.
`CODEX_CLI_SANDBOX_MODE`	Sandbox mode passed to the provider.
`CODEX_CLI_VERBOSE`	Enables provider verbosity when supported.

Candidate Resolution

When a step has a high-level candidate, the runner observes the app first and then asks the LLM to resolve the candidate into a provider-neutral executable candidate.

For the full candidate model and guidance on when to use each level, see CANDIDATES.md.

app observation
  + plan context
  + step goal
  + operation
  + candidate
  + provider capabilities
  -> resolved candidate

Example input candidate:

semantic:
  instruction: Find the project name field.

Possible resolved candidate:

exact:
  stableId: project-name-input

or:

structural:
  role: textbox
  name: Project name

The resolved candidate is then passed to the provider. The provider decides whether it can handle it and performs the operation.

Assertion Judgement

Semantic and visual assertions can be judged by the LLM when a provider cannot deterministically prove them.

success:
  any:
    - semantic:
        intent: The project was created successfully.

The runner provides the current observation and asks for a structured judgement:

passed
confidence
message
optional evidence notes

Low-confidence judgement is treated as a failed verification rather than a silent pass.

Failure Classification

The runner emits one failure class:

Class	Meaning
`app_bug`	The app did not meet a user-visible acceptance criterion.
`plan_unclear`	The plan is ambiguous or underspecified.
`environment`	Provider, app, browser, terminal, desktop session, or LLM backend failed.

If the LLM backend itself fails, the runner records that as an environment problem. For example, an expired account, missing API key, unsupported model prefix, or request timeout should not be reported as an app bug.

Trace Files

Every run writes:

.agent-check/runs/<runId>/llm-trace.json

This file is the easiest way to confirm that the LLM was actually used.

Event types include:

Event	Meaning
`candidate_resolution`	A high-level candidate was resolved or failed to resolve.
`assertion_judgement`	A semantic or visual assertion was judged.
`failure_classification`	A failed step was classified.

Statuses include:

Status	Meaning
`used`	The LLM returned a structured result.
`unavailable`	The LLM was needed but unavailable or failed.
`ignored`	The LLM result was not accepted, usually because confidence was too low.

Example excerpt:

[
  {
    "event": "candidate_resolution",
    "status": "used",
    "stepId": "fill_project_name",
    "model": "zhipu/glm-4.7",
    "resolvedCandidate": {
      "exact": {
        "stableId": "project-name-input"
      }
    },
    "confidence": 1,
    "reason": "The observation contains a stable project-name input."
  }
]

Implementation Files

Relevant files:

File	Role
`src/llm/llmClient.ts`	Provider-neutral LLM interface and structured result schemas.
`src/llm/vercelAiLlmClient.ts`	Vercel AI SDK backed implementation.
`src/core/runner.ts`	Decides when to call the LLM and records trace events.
`src/cli/index.ts`	Loads `.env`, reads model config, and wires the client into runs.

The Vercel AI SDK is the abstraction layer. OpenAI, Codex CLI, Zhipu, Z.AI, or future providers are runtime model choices, not testing plan concepts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Inference

Where It Fits

Model Configuration

`.env`

Supported Prefixes

Environment Variables

Candidate Resolution

Assertion Judgement

Failure Classification

Trace Files

Implementation Files

FilesExpand file tree

LLM.md

Latest commit

History

LLM.md

File metadata and controls

LLM Inference

Where It Fits

Model Configuration

.env

Supported Prefixes

Environment Variables

Candidate Resolution

Assertion Judgement

Failure Classification

Trace Files

Implementation Files

`.env`