Skip to content

Latest commit

 

History

History
485 lines (367 loc) · 10.7 KB

File metadata and controls

485 lines (367 loc) · 10.7 KB

Candidates

Candidates are the core mechanism that lets Agent Check sit between pure natural-language testing and fully deterministic test automation.

A candidate is one possible way to find or describe the target of an operation or assertion. A step can include multiple candidates at different confidence and detail levels. The runner tries the cheapest and most deterministic useful candidate first, then moves toward higher-level interpretation only when needed.

Why Candidates Exist

Freshly generated applications often do not have stable test ids, final labels, or settled UI structure yet. An AI agent may know the user intent before it knows the exact selector or automation id.

Traditional tests usually require low-level details up front:

exact:
  selector: "[data-testid='create-project-button']"

Pure computer-use testing often starts high level and asks the LLM to interpret the screen repeatedly:

semantic:
  instruction: Find the button that creates a project.

Agent Check lets a plan contain both:

candidates:
  - exact:
      stableId: create-project-button
  - structural:
      role: button
      name: Create Project
  - semantic:
      instruction: Find the main action that creates a project.

If the exact candidate works, no LLM is needed. If it is missing or stale, the runner can fall back to structural or semantic resolution.

Candidate Levels

Level Best When Cost Determinism
exact You know a stable id, selector, command, key sequence, or automation id. lowest highest
structural You know user-visible structure such as role, label, placeholder, text, region, or window title. low high
semantic You know the target in natural language but not exact UI structure. medium medium
task The operation may require a short user-level task, not one direct control. medium-high medium-low
intent You know the desired outcome but not the interaction details. high lower
visual The condition is screen-level or layout-like. high lower
providerHint You want to offer an escape hatch for a provider-specific locator. low varies

The important pattern is progressive specificity:

intent / task / semantic / visual
  -> structural
  -> exact

Plans can start high level, then become cheaper as agents or humans add resolved candidates discovered during real runs.

Recommended Ordering

For most plans:

runtime:
  resolutionOrder:
    - exact
    - structural
    - semantic
    - task
    - intent
    - visual
    - providerHint

This order prefers low-cost deterministic execution first.

Use a different order when the app is extremely unstable or when exact anchors are known to be stale. For example, while approving a newly generated UI:

runtime:
  style: adaptive
  resolutionOrder:
    - structural
    - semantic
    - exact
    - visual

That tells the runner to trust user-visible structure before brittle ids.

Exact Candidates

Use exact when the plan knows a stable machine-targetable anchor.

Web:

exact:
  stableId: create-project-button

TUI:

exact:
  tui:
    command: npm test

Desktop:

exact:
  desktop:
    automationId: saveButton

Electron:

exact:
  electron:
    accelerator: Ctrl+S

Exact candidates should be preferred when they are stable because they avoid LLM cost and reduce ambiguity.

Avoid making exact candidates too brittle. A generated CSS path like div:nth-child(3) > button:nth-child(2) may be exact, but it is usually a poor long-term candidate.

Structural Candidates

Use structural when the target is identifiable through accessibility or visible UI structure.

Web:

structural:
  role: button
  name: Create Project

Form field:

structural:
  label: Project name

TUI:

structural:
  tui:
    visibleText: "Ready"
    region: main

Desktop:

structural:
  desktop:
    name: Settings
    windowTitle:
      contains: CodeBolt

Structural candidates are often the best default for agent-created apps because they survive many implementation changes while staying cheaper than LLM interpretation.

Semantic Candidates

Use semantic when the agent knows what the target means but not how the app currently exposes it.

semantic:
  instruction: Find the field where the user enters the project name.

At runtime, the runner observes the app state and asks the LLM to resolve this into a provider-neutral candidate, such as:

structural:
  label: Project name

or:

exact:
  stableId: project-name-input

Semantic candidates should be specific. Good semantic instructions name the user-visible role and context:

semantic:
  instruction: Find the primary button in the project creation form that submits the project.

Avoid vague instructions:

semantic:
  instruction: Click the right thing.

Task Candidates

Use task when the operation may take more than a single click or input, but is still a bounded user action.

task:
  instruction: Create a project named "{{ PROJECT_NAME }}" from the current page.

Task candidates are useful for approval testing of newly generated flows, but they are more expensive and less deterministic than exact or structural candidates. Prefer replacing them with step-level operations once the app shape is known.

Intent Candidates

Use intent for outcome-oriented goals where the path is not known.

intent:
  instruction: The user can start creating a new project.

Intent candidates are best for early exploratory approval runs. They should not be the final long-term form of a test if the flow becomes stable.

Visual Candidates

Use visual when the assertion or target is primarily screen-level.

visual:
  intent: The project dashboard is visible and not blank.

Visual candidates are useful for catching broken rendering, blank screens, obvious layout collapse, or missing screen transitions. They are not a replacement for exact user-visible assertions when specific behavior matters.

Provider Hints

providerHint is an escape hatch. It can include a provider-specific hint while keeping the durable plan's primary candidates provider-neutral.

candidates:
  - structural:
      role: button
      name: Create Project
  - providerHint:
      kind: css
      value: "[data-testid='create-project-button']"

Use provider hints sparingly. They are useful for local experiments, migration, or compatibility with an app that already exposes good low-level selectors.

Surface-Specific Candidate Fields

Candidates may use common fields directly or nest fields under a surface name when the plan needs to distinguish surfaces.

candidates:
  - exact:
      web:
        stableId: create-project-button
  - exact:
      tui:
        keys:
          - Enter
  - exact:
      desktop:
        automationId: saveButton
  - exact:
      electron:
        accelerator: Ctrl+S
  - exact:
      mobile:
        accessibilityId: save

Surface nesting is still provider-neutral. web, tui, desktop, electron, and mobile describe app surfaces, not automation engines.

Candidates In Operations

Operation candidates identify what the runner should act on.

operation:
  type: input
  value: "{{ PROJECT_NAME }}"
  candidates:
    - exact:
        stableId: project-name-input
    - structural:
        label: Project name
    - semantic:
        instruction: Find the project name input field.

The provider receives the selected or resolved candidate and performs the operation.

Candidates In Success Conditions

Success conditions also support different levels.

success:
  any:
    - exact:
        visibleText: "{{ PROJECT_NAME }}"
    - semantic:
        intent: The project was created and is visible to the user.

The runner first asks providers to verify deterministic conditions. If a semantic or visual condition cannot be verified deterministically, the LLM can judge the observation.

Ambiguity

Candidates should reduce ambiguity. When several controls could match, include context.

Less clear:

structural:
  role: button
  name: Save

Better:

structural:
  role: button
  name: Save
  region: Project settings form

Better semantic fallback:

semantic:
  instruction: Find the Save button inside the project settings form, not the global toolbar.

If the runner cannot identify the intended target with enough confidence, the failure should be plan_unclear, not app_bug.

Refinement

Refinement is how Agent Check turns flexible early tests into stable later tests.

During a run, a provider or LLM may discover that:

semantic:
  instruction: Find the project name field.

resolved to:

exact:
  stableId: project-name-input

The run can store a refinement suggestion in result.json. The original plan is not mutated during run. Applying suggestions is explicit:

agent-check refine <runId> --plan path\to\plan.yaml --apply

Good refinements preserve the higher-level fallback while adding the cheaper candidate first:

candidates:
  - exact:
      stableId: project-name-input
  - semantic:
      instruction: Find the project name field.

That way, future runs are deterministic when possible but still adaptive when the app changes.

Writing Guidance For Agents

When an AI agent writes a plan, it should:

  • use exact when it just created or inspected stable ids
  • use structural for accessible labels, roles, text, windows, terminal output, and menu-like structure
  • use semantic when it knows the target but not the exact UI details
  • use task or intent only when the interaction path is genuinely unknown
  • keep visual for screen-level assertions or rendering checks
  • include multiple candidates when possible
  • place cheaper candidates before expensive candidates
  • add refinements after successful runtime resolution
  • avoid provider engine names in the plan

Quick Examples

Low-cost deterministic web action:

operation:
  type: interact
  candidates:
    - exact:
        stableId: create-project-button

Balanced web action:

operation:
  type: interact
  candidates:
    - structural:
        role: button
        name: Create Project
    - semantic:
        instruction: Find the primary action that creates a project.

Early approval-test action:

operation:
  type: task
  value: "{{ PROJECT_NAME }}"
  candidates:
    - task:
        instruction: Create a project with the provided project name.

Provider-neutral success:

success:
  all:
    - exact:
        visibleText: "{{ PROJECT_NAME }}"
    - visual:
        intent: The result screen is visible and not blank.