Replace jest integration tests with vally eval suites#2170
Open
JasonYeMSFT wants to merge 23 commits into
Open
Replace jest integration tests with vally eval suites#2170JasonYeMSFT wants to merge 23 commits into
JasonYeMSFT wants to merge 23 commits into
Conversation
6385534 to
1bb984e
Compare
JasonYeMSFT
commented
May 18, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR migrates Copilot SDK-based integration testing from bespoke Jest integration tests toward Vally eval suites by introducing a custom Vally executor, adding stimulus validation tooling, and patching the shared agent runner to support both legacy Jest tests and the new Vally path.
Changes:
- Added a custom Vally executor (
integration-test-agent-runner) plus tag helpers to emulate missing Vally features (early termination, follow-ups, system prompt changes, screenshots) and to generate dashboard-compatible artifacts. - Updated the shared agent runner APIs and updated Jest integration tests/scripts to use the new
useAgentRunner({ isTest, useJest, ... })config shape. - Added a
scriptsCLI for validating eval stimuli metadata and updated CI workflow + eval specs to use the new executor.
Show a summary per file
| File | Description |
|---|---|
| tests/vally/vally-executor.ts | New custom Vally executor that runs the Copilot SDK agent runner and attempts to post-process Vally outputs for the dashboard. |
| tests/vally/tag-helpers.ts | New helpers to parse stimulus tags into early-termination/follow-up/system-prompt/screenshot behaviors. |
| tests/utils/evaluate.ts | Updates import to include .ts extension for ESM/TS extension compatibility. |
| tests/utils/agent-runner.ts | Extends agent runner config (model/systemPrompt/workspace) and adds a new useAgentRunner configuration shape for Jest vs Vally usage. |
| tests/tsconfig.json | Enables TS extension imports and configures TS for typecheck-only (noEmit). |
| tests/scripts/generate-test-reports.ts | Updates to new useAgentRunner({ ... }) signature; forces explicit process exit on success/failure. |
| tests/README.md | Documents ongoing migration to Vally and points to executor and eval docs. |
| tests/package-lock.json | Bumps nested @github/copilot dependency under @microsoft/vally from 1.0.44 → 1.0.45. |
| tests/microsoft-foundry/quota/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/models/deploy/deploy-model/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/models/deploy/deploy-model-optimal-region/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/models/deploy/customize-deployment/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/models/deploy/capacity/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/foundry-agent/troubleshoot/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/foundry-agent/trace/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/foundry-agent/observe/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/foundry-agent/invoke/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/foundry-agent/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/foundry-agent/deploy/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/microsoft-foundry/foundry-agent/create/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/entra-app-registration/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/entra-agent-id/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-validate/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-upgrade/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-storage/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-resource-visualizer/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-resource-lookup/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-reliability/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-rbac/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-quotas/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-prepare/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-messaging/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-kusto/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-kubernetes/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-kubernetes/azure-kubernetes-automatic-readiness/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-hosted-copilot-sdk/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-enterprise-infra-planner/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-diagnostics/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-deploy/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-deploy/avm/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-cost/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-compute/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-compliance/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-cloud-migrate/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-aigateway/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/azure-ai/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/appinsights-instrumentation/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/airunway-aks-setup/integration.test.ts | Updates useAgentRunner call to new config object. |
| tests/_template/integration.test.ts | Updates template to use the new useAgentRunner config object. |
| scripts/src/vally/validate-stimulus.ts | New validator that enforces required tags/metadata conventions across evals/**/eval.yaml. |
| scripts/src/vally/cli.ts | New npm run vally ... CLI entrypoint for Vally-related repo tooling. |
| scripts/package.json | Adds vally script to run the new CLI via tsx. |
| evals/microsoft-foundry/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/entra-app-registration/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/azure-upgrade/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/azure-storage/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/azure-prepare/eval.yaml | Switches executor and adds missing per-stimulus cost: llm tags. |
| evals/azure-kusto/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/azure-hosted-copilot-sdk/eval.yaml | Switches executor and adds missing per-stimulus cost: llm tags. |
| evals/azure-enterprise-infra-planner/eval.yaml | Switches executor and adds missing per-stimulus cost: llm tags. |
| evals/azure-diagnostics/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/azure-deploy/eval.yaml | Switches executor, adds missing cost: llm tags, and adds a new stimulus migrated from Jest. |
| evals/azure-compliance/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/azure-aigateway/eval.yaml | Switches executor to integration-test-agent-runner. |
| evals/azure-ai/eval.yaml | Switches executor and adds early-terminate tags; removes completed graders from the shown stimuli. |
| .github/workflows/eval.yml | Adds stimulus validation step; adjusts fork gating to step-level for eval execution; formatting updates. |
| .github/skills/vally-eval/SKILL.md | Adds a repo-local skill describing how to author/validate/run Vally eval suites and use the custom executor. |
Copilot's findings
Files not reviewed (1)
- tests/package-lock.json: Language not supported
- Files reviewed: 67/68 changed files
- Comments generated: 9
JasonYeMSFT
commented
May 18, 2026
4da8380 to
c04f048
Compare
JasonYeMSFT
commented
May 18, 2026
Support aggregating test results
Extract skill name from eval definition
Document migration and backward compatibility changes
13c1abf to
ecc0301
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The existing jest integration tests are written to manually call into the agent runner and assert the results. We have decided to migrate them to use vally to take advantage of its parallelization and rich set of graders. Unfortunately, vally doesn't support several critical features that our tests use, such as
In addition, it doesn't publish the necessary data that our integration test dashboard consumes for viewing the test results.
This PR creates a vally custom executor to support the missing features and publish all necessary data that the integration test dashboard depends on. The underlying agent runner used by both the existing integration tests and the vally custom executor is patched to achieve temporary backward compatibility.
Together with the infrastructure change to enable migration, this PR migrated existing vally eval suites to use the custom executor and tested on them to make sure the extended features and result data publishing work as expected. Some patches were made in the agent runner to achieve backward combability, supporting both Jest integration tests and vally eval suites with shared agent runner.
Checklist
cd tests && npm test)npm run test:skills:integration -- <skill>)USE FOR/DO NOT USE FOR/PREFER OVERclauses: confirmed no routing regressions for competing skillsRelated Issues
#1821
https://github.com/microsoft/evaluate/issues/276