Skip to content

Replace jest integration tests with vally eval suites#2170

Open
JasonYeMSFT wants to merge 23 commits into
microsoft:mainfrom
JasonYeMSFT:chuye/vally-experiment
Open

Replace jest integration tests with vally eval suites#2170
JasonYeMSFT wants to merge 23 commits into
microsoft:mainfrom
JasonYeMSFT:chuye/vally-experiment

Conversation

@JasonYeMSFT
Copy link
Copy Markdown
Member

@JasonYeMSFT JasonYeMSFT commented May 5, 2026

Description

The existing jest integration tests are written to manually call into the agent runner and assert the results. We have decided to migrate them to use vally to take advantage of its parallelization and rich set of graders. Unfortunately, vally doesn't support several critical features that our tests use, such as

  • early termination
  • follow up
  • system prompt modification
  • screenshot

In addition, it doesn't publish the necessary data that our integration test dashboard consumes for viewing the test results.

This PR creates a vally custom executor to support the missing features and publish all necessary data that the integration test dashboard depends on. The underlying agent runner used by both the existing integration tests and the vally custom executor is patched to achieve temporary backward compatibility.

Together with the infrastructure change to enable migration, this PR migrated existing vally eval suites to use the custom executor and tested on them to make sure the extended features and result data publishing work as expected. Some patches were made in the agent runner to achieve backward combability, supporting both Jest integration tests and vally eval suites with shared agent runner.

Checklist

  • Tests pass locally (cd tests && npm test)
  • If modifying skill descriptions: verified routing correctness with integration tests (npm run test:skills:integration -- <skill>)
  • If modifying skill USE FOR / DO NOT USE FOR / PREFER OVER clauses: confirmed no routing regressions for competing skills

Related Issues

#1821
https://github.com/microsoft/evaluate/issues/276

@JasonYeMSFT JasonYeMSFT force-pushed the chuye/vally-experiment branch 2 times, most recently from 6385534 to 1bb984e Compare May 14, 2026 23:54
Comment thread tests/vally/vally-executor.ts Fixed
@JasonYeMSFT JasonYeMSFT changed the title poc: use vally to drive integration tests Replace jest integration tests with vally eval suites May 15, 2026
Comment thread .github/workflows/eval.yml
@JasonYeMSFT JasonYeMSFT requested review from Copilot and removed request for Copilot May 18, 2026 20:12
@JasonYeMSFT JasonYeMSFT marked this pull request as ready for review May 18, 2026 20:12
Copilot AI review requested due to automatic review settings May 18, 2026 20:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Copilot SDK-based integration testing from bespoke Jest integration tests toward Vally eval suites by introducing a custom Vally executor, adding stimulus validation tooling, and patching the shared agent runner to support both legacy Jest tests and the new Vally path.

Changes:

  • Added a custom Vally executor (integration-test-agent-runner) plus tag helpers to emulate missing Vally features (early termination, follow-ups, system prompt changes, screenshots) and to generate dashboard-compatible artifacts.
  • Updated the shared agent runner APIs and updated Jest integration tests/scripts to use the new useAgentRunner({ isTest, useJest, ... }) config shape.
  • Added a scripts CLI for validating eval stimuli metadata and updated CI workflow + eval specs to use the new executor.
Show a summary per file
File Description
tests/vally/vally-executor.ts New custom Vally executor that runs the Copilot SDK agent runner and attempts to post-process Vally outputs for the dashboard.
tests/vally/tag-helpers.ts New helpers to parse stimulus tags into early-termination/follow-up/system-prompt/screenshot behaviors.
tests/utils/evaluate.ts Updates import to include .ts extension for ESM/TS extension compatibility.
tests/utils/agent-runner.ts Extends agent runner config (model/systemPrompt/workspace) and adds a new useAgentRunner configuration shape for Jest vs Vally usage.
tests/tsconfig.json Enables TS extension imports and configures TS for typecheck-only (noEmit).
tests/scripts/generate-test-reports.ts Updates to new useAgentRunner({ ... }) signature; forces explicit process exit on success/failure.
tests/README.md Documents ongoing migration to Vally and points to executor and eval docs.
tests/package-lock.json Bumps nested @github/copilot dependency under @microsoft/vally from 1.0.44 → 1.0.45.
tests/microsoft-foundry/quota/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/models/deploy/deploy-model/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/models/deploy/deploy-model-optimal-region/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/models/deploy/customize-deployment/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/models/deploy/capacity/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/foundry-agent/troubleshoot/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/foundry-agent/trace/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/foundry-agent/observe/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/foundry-agent/invoke/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/foundry-agent/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/foundry-agent/deploy/integration.test.ts Updates useAgentRunner call to new config object.
tests/microsoft-foundry/foundry-agent/create/integration.test.ts Updates useAgentRunner call to new config object.
tests/entra-app-registration/integration.test.ts Updates useAgentRunner call to new config object.
tests/entra-agent-id/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-validate/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-upgrade/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-storage/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-resource-visualizer/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-resource-lookup/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-reliability/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-rbac/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-quotas/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-prepare/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-messaging/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-kusto/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-kubernetes/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-kubernetes/azure-kubernetes-automatic-readiness/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-hosted-copilot-sdk/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-enterprise-infra-planner/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-diagnostics/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-deploy/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-deploy/avm/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-cost/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-compute/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-compliance/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-cloud-migrate/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-aigateway/integration.test.ts Updates useAgentRunner call to new config object.
tests/azure-ai/integration.test.ts Updates useAgentRunner call to new config object.
tests/appinsights-instrumentation/integration.test.ts Updates useAgentRunner call to new config object.
tests/airunway-aks-setup/integration.test.ts Updates useAgentRunner call to new config object.
tests/_template/integration.test.ts Updates template to use the new useAgentRunner config object.
scripts/src/vally/validate-stimulus.ts New validator that enforces required tags/metadata conventions across evals/**/eval.yaml.
scripts/src/vally/cli.ts New npm run vally ... CLI entrypoint for Vally-related repo tooling.
scripts/package.json Adds vally script to run the new CLI via tsx.
evals/microsoft-foundry/eval.yaml Switches executor to integration-test-agent-runner.
evals/entra-app-registration/eval.yaml Switches executor to integration-test-agent-runner.
evals/azure-upgrade/eval.yaml Switches executor to integration-test-agent-runner.
evals/azure-storage/eval.yaml Switches executor to integration-test-agent-runner.
evals/azure-prepare/eval.yaml Switches executor and adds missing per-stimulus cost: llm tags.
evals/azure-kusto/eval.yaml Switches executor to integration-test-agent-runner.
evals/azure-hosted-copilot-sdk/eval.yaml Switches executor and adds missing per-stimulus cost: llm tags.
evals/azure-enterprise-infra-planner/eval.yaml Switches executor and adds missing per-stimulus cost: llm tags.
evals/azure-diagnostics/eval.yaml Switches executor to integration-test-agent-runner.
evals/azure-deploy/eval.yaml Switches executor, adds missing cost: llm tags, and adds a new stimulus migrated from Jest.
evals/azure-compliance/eval.yaml Switches executor to integration-test-agent-runner.
evals/azure-aigateway/eval.yaml Switches executor to integration-test-agent-runner.
evals/azure-ai/eval.yaml Switches executor and adds early-terminate tags; removes completed graders from the shown stimuli.
.github/workflows/eval.yml Adds stimulus validation step; adjusts fork gating to step-level for eval execution; formatting updates.
.github/skills/vally-eval/SKILL.md Adds a repo-local skill describing how to author/validate/run Vally eval suites and use the custom executor.

Copilot's findings

Files not reviewed (1)
  • tests/package-lock.json: Language not supported
  • Files reviewed: 67/68 changed files
  • Comments generated: 9

Comment thread scripts/src/vally/validate-stimulus.ts
Comment thread tests/vally/vally-executor.ts Outdated
Comment thread tests/README.md Outdated
Comment thread tests/vally/vally-executor.ts
Comment thread tests/utils/agent-runner.ts
Comment thread tests/vally/vally-executor.ts Outdated
Comment thread tests/vally/vally-executor.ts Outdated
Comment thread tests/vally/tag-helpers.ts
Comment thread .github/workflows/eval.yml
Comment thread .github/workflows/eval.yml
@JasonYeMSFT JasonYeMSFT force-pushed the chuye/vally-experiment branch from 4da8380 to c04f048 Compare May 18, 2026 20:27
Comment thread .github/skills/vally-eval/SKILL.md
@JasonYeMSFT JasonYeMSFT force-pushed the chuye/vally-experiment branch from 13c1abf to ecc0301 Compare May 22, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants