Candidates are the core mechanism that lets Agent Check sit between pure natural-language testing and fully deterministic test automation.
A candidate is one possible way to find or describe the target of an operation or assertion. A step can include multiple candidates at different confidence and detail levels. The runner tries the cheapest and most deterministic useful candidate first, then moves toward higher-level interpretation only when needed.
Freshly generated applications often do not have stable test ids, final labels, or settled UI structure yet. An AI agent may know the user intent before it knows the exact selector or automation id.
Traditional tests usually require low-level details up front:
exact:
selector: "[data-testid='create-project-button']"Pure computer-use testing often starts high level and asks the LLM to interpret the screen repeatedly:
semantic:
instruction: Find the button that creates a project.Agent Check lets a plan contain both:
candidates:
- exact:
stableId: create-project-button
- structural:
role: button
name: Create Project
- semantic:
instruction: Find the main action that creates a project.If the exact candidate works, no LLM is needed. If it is missing or stale, the runner can fall back to structural or semantic resolution.
| Level | Best When | Cost | Determinism |
|---|---|---|---|
exact |
You know a stable id, selector, command, key sequence, or automation id. | lowest | highest |
structural |
You know user-visible structure such as role, label, placeholder, text, region, or window title. | low | high |
semantic |
You know the target in natural language but not exact UI structure. | medium | medium |
task |
The operation may require a short user-level task, not one direct control. | medium-high | medium-low |
intent |
You know the desired outcome but not the interaction details. | high | lower |
visual |
The condition is screen-level or layout-like. | high | lower |
providerHint |
You want to offer an escape hatch for a provider-specific locator. | low | varies |
The important pattern is progressive specificity:
intent / task / semantic / visual
-> structural
-> exact
Plans can start high level, then become cheaper as agents or humans add resolved candidates discovered during real runs.
For most plans:
runtime:
resolutionOrder:
- exact
- structural
- semantic
- task
- intent
- visual
- providerHintThis order prefers low-cost deterministic execution first.
Use a different order when the app is extremely unstable or when exact anchors are known to be stale. For example, while approving a newly generated UI:
runtime:
style: adaptive
resolutionOrder:
- structural
- semantic
- exact
- visualThat tells the runner to trust user-visible structure before brittle ids.
Use exact when the plan knows a stable machine-targetable anchor.
Web:
exact:
stableId: create-project-buttonTUI:
exact:
tui:
command: npm testDesktop:
exact:
desktop:
automationId: saveButtonElectron:
exact:
electron:
accelerator: Ctrl+SExact candidates should be preferred when they are stable because they avoid LLM cost and reduce ambiguity.
Avoid making exact candidates too brittle. A generated CSS path like
div:nth-child(3) > button:nth-child(2) may be exact, but it is usually a poor
long-term candidate.
Use structural when the target is identifiable through accessibility or
visible UI structure.
Web:
structural:
role: button
name: Create ProjectForm field:
structural:
label: Project nameTUI:
structural:
tui:
visibleText: "Ready"
region: mainDesktop:
structural:
desktop:
name: Settings
windowTitle:
contains: CodeBoltStructural candidates are often the best default for agent-created apps because they survive many implementation changes while staying cheaper than LLM interpretation.
Use semantic when the agent knows what the target means but not how the app
currently exposes it.
semantic:
instruction: Find the field where the user enters the project name.At runtime, the runner observes the app state and asks the LLM to resolve this into a provider-neutral candidate, such as:
structural:
label: Project nameor:
exact:
stableId: project-name-inputSemantic candidates should be specific. Good semantic instructions name the user-visible role and context:
semantic:
instruction: Find the primary button in the project creation form that submits the project.Avoid vague instructions:
semantic:
instruction: Click the right thing.Use task when the operation may take more than a single click or input, but is
still a bounded user action.
task:
instruction: Create a project named "{{ PROJECT_NAME }}" from the current page.Task candidates are useful for approval testing of newly generated flows, but they are more expensive and less deterministic than exact or structural candidates. Prefer replacing them with step-level operations once the app shape is known.
Use intent for outcome-oriented goals where the path is not known.
intent:
instruction: The user can start creating a new project.Intent candidates are best for early exploratory approval runs. They should not be the final long-term form of a test if the flow becomes stable.
Use visual when the assertion or target is primarily screen-level.
visual:
intent: The project dashboard is visible and not blank.Visual candidates are useful for catching broken rendering, blank screens, obvious layout collapse, or missing screen transitions. They are not a replacement for exact user-visible assertions when specific behavior matters.
providerHint is an escape hatch. It can include a provider-specific hint while
keeping the durable plan's primary candidates provider-neutral.
candidates:
- structural:
role: button
name: Create Project
- providerHint:
kind: css
value: "[data-testid='create-project-button']"Use provider hints sparingly. They are useful for local experiments, migration, or compatibility with an app that already exposes good low-level selectors.
Candidates may use common fields directly or nest fields under a surface name when the plan needs to distinguish surfaces.
candidates:
- exact:
web:
stableId: create-project-button
- exact:
tui:
keys:
- Enter
- exact:
desktop:
automationId: saveButton
- exact:
electron:
accelerator: Ctrl+S
- exact:
mobile:
accessibilityId: saveSurface nesting is still provider-neutral. web, tui, desktop, electron,
and mobile describe app surfaces, not automation engines.
Operation candidates identify what the runner should act on.
operation:
type: input
value: "{{ PROJECT_NAME }}"
candidates:
- exact:
stableId: project-name-input
- structural:
label: Project name
- semantic:
instruction: Find the project name input field.The provider receives the selected or resolved candidate and performs the operation.
Success conditions also support different levels.
success:
any:
- exact:
visibleText: "{{ PROJECT_NAME }}"
- semantic:
intent: The project was created and is visible to the user.The runner first asks providers to verify deterministic conditions. If a semantic or visual condition cannot be verified deterministically, the LLM can judge the observation.
Candidates should reduce ambiguity. When several controls could match, include context.
Less clear:
structural:
role: button
name: SaveBetter:
structural:
role: button
name: Save
region: Project settings formBetter semantic fallback:
semantic:
instruction: Find the Save button inside the project settings form, not the global toolbar.If the runner cannot identify the intended target with enough confidence, the
failure should be plan_unclear, not app_bug.
Refinement is how Agent Check turns flexible early tests into stable later tests.
During a run, a provider or LLM may discover that:
semantic:
instruction: Find the project name field.resolved to:
exact:
stableId: project-name-inputThe run can store a refinement suggestion in result.json. The original plan is
not mutated during run. Applying suggestions is explicit:
agent-check refine <runId> --plan path\to\plan.yaml --applyGood refinements preserve the higher-level fallback while adding the cheaper candidate first:
candidates:
- exact:
stableId: project-name-input
- semantic:
instruction: Find the project name field.That way, future runs are deterministic when possible but still adaptive when the app changes.
When an AI agent writes a plan, it should:
- use
exactwhen it just created or inspected stable ids - use
structuralfor accessible labels, roles, text, windows, terminal output, and menu-like structure - use
semanticwhen it knows the target but not the exact UI details - use
taskorintentonly when the interaction path is genuinely unknown - keep
visualfor screen-level assertions or rendering checks - include multiple candidates when possible
- place cheaper candidates before expensive candidates
- add refinements after successful runtime resolution
- avoid provider engine names in the plan
Low-cost deterministic web action:
operation:
type: interact
candidates:
- exact:
stableId: create-project-buttonBalanced web action:
operation:
type: interact
candidates:
- structural:
role: button
name: Create Project
- semantic:
instruction: Find the primary action that creates a project.Early approval-test action:
operation:
type: task
value: "{{ PROJECT_NAME }}"
candidates:
- task:
instruction: Create a project with the provided project name.Provider-neutral success:
success:
all:
- exact:
visibleText: "{{ PROJECT_NAME }}"
- visual:
intent: The result screen is visible and not blank.