feat: migrate trigger tests to Vally eval configs (22 skills, 722 stimuli) by wbreza · Pull Request #2282 · microsoft/GitHub-Copilot-for-Azure

wbreza · 2026-05-15T22:27:43Z

Replaces #1899 (migrated from fork to org branch for CI secret access).

Summary

Migrates trigger routing tests to Vally eval configs for all 22 skills, building on the foundation established in #1912. Each skill gets a dedicated triggers.eval.yaml with positive and negative routing stimuli.

Skills Migrated

Skill	Stimuli	Key Graders
`appinsights-instrumentation`	17	skill-invocation, completed, output-not-matches
`azure-ai`	25	skill-invocation, completed, output-not-matches
`azure-aigateway`	25	skill-invocation, completed, output-not-matches
`azure-cloud-migrate`	24	skill-invocation, completed, output-not-matches
`azure-compliance`	36	skill-invocation, completed, output-not-matches
`azure-compute`	62	skill-invocation, completed, output-not-matches
`azure-cost`	60	skill-invocation, completed, output-not-matches
`azure-deploy`	23	skill-invocation, completed, output-not-matches
`azure-diagnostics`	40	skill-invocation, completed, output-not-matches
`azure-enterprise-infra-planner`	39	skill-invocation, completed, output-not-matches
`azure-hosted-copilot-sdk`	18	skill-invocation, completed, output-not-matches
`azure-kubernetes`	44	skill-invocation, completed, output-not-matches
`azure-messaging`	32	skill-invocation, completed, output-not-matches
`azure-prepare`	38	skill-invocation, completed, output-not-matches
`azure-quotas`	25	skill-invocation, completed, output-not-matches
`azure-rbac`	35	skill-invocation, completed, output-not-matches
`azure-resource-lookup`	46	skill-invocation, completed, output-not-matches
`azure-resource-visualizer`	37	skill-invocation, completed, output-not-matches
`azure-upgrade`	16	skill-invocation, completed, output-not-matches
`azure-validate`	23	skill-invocation, completed, output-not-matches
`entra-app-registration`	19	skill-invocation, completed, output-not-matches
`microsoft-foundry`	38	skill-invocation, completed, output-not-matches

Total: 722 stimuli across 22 skills

Changes

Adds 22 new evals/*/triggers.eval.yaml configs
Each config includes positive triggers (skill SHOULD be invoked) and negative triggers (skill should NOT be invoked)
Consistent grader pattern: skill-invocation + completed + output-not-matches (fatal error guard)
Updates CI workflow (eval.yml) for new eval paths
Cleans up legacy test configs

Eval Patterns

All specs follow the validated patterns from #1912:

Model: claude-sonnet-4.5
Executor: copilot-sdk
Duration format: human-friendly ("2m")
3 runs per stimulus for routing reliability
Valid grader types only (per published @microsoft/vally-cli@0.4.0)

Dependencies

Requires ci(eval): migrate to Vally eval framework with v0.4.0 features #1912 (merged ✅)

The eval workflow was invoking px @microsoft/vally-cli without any npm auth setup, so npm fell back to the public registry and the package (published to GitHub Packages) could not be resolved. - Add .npmrc mapping @microsoft scope to npm.pkg.github.com - Add scope: '@microsoft' to setup-node so NODE_AUTH_TOKEN is applied - Add an pm install --no-save step (with NODE_AUTH_TOKEN) so the @microsoft/vally-cli devDependency is resolved via authenticated fetch - Declare @microsoft/vally-cli in devDependencies (latest) so local dev and CI both resolve it through a single config path This mirrors the working setup in wbreza/skills. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Workflow hardening: - Drop pull_request trigger (keep workflow_dispatch only) to eliminate token exfiltration vector from untrusted PR code - Add top-level permissions block (contents/packages: read) for defense-in-depth Package hygiene: - Remove @microsoft/vally-cli from devDependencies (CI installs it explicitly via GitHub Packages); lockfile regenerated in sync - Remove unused root yaml dependency Eval spec cleanup: - Remove 13 broad output-not-contains "error"/"failed" graders from azure-hosted-copilot-sdk/eval.yaml (kept specific fatal-error regex) - Add azure-prepare, azure-validate, azure-deploy to environment.skills - Remove cost:free tag from all LLM-backed stimuli across 4 eval files (reserved now for non-LLM static evals) - Align .vally.yaml suite descriptions with accurate tag semantics Cleanup: - Delete stale Waza task files in azure-hosted-copilot-sdk/tasks/ - Add evals/README.md with local vally-cli run instructions - Gitignore local results/ output directory Follow-up issue #1920 tracks wiring CI to a curated medium suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Update ai-bench references in evals/README.md to microsoft/evaluate (the actual upstream Vally repo name) - Add https://aka.ms/vally as the canonical docs link - Clarify that contributors don't need source-repo access to run evals locally — the @microsoft/vally-cli package from GitHub Packages is sufficient Addresses JasonYeMSFT's review question on evals/README.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…/vally - Remove .npmrc private registry config (GitHub Packages no longer needed) - Update eval workflow to use @microsoft/vally from public npm - Remove VALLY_NPM_TOKEN secret requirement from CI - Update evals/README.md with public npm installation instructions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Delete tests/scripts/run-waza.js (replaced by direct vally CLI usage) - Delete .waza.yaml (replaced by .vally.yaml) - Remove waza and waza:live npm scripts from tests/package.json - Update tests/README.md: replace Waza Eval Mode section with Vally - Update tests/azure-prepare/eval/README.md: replace waza references with vally - Update eval.yaml comment headers in azure-enterprise-infra-planner and azure-prepare Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Install @microsoft/vally-cli (not @microsoft/vally) for CLI executable - Run npm install to sync package-lock.json with package.json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Convert trigger test prompt lists from Jest triggers.test.ts files and Waza trigger_tests.yaml files into Vally eval configs using skill-invocation grader with required/disallowed assertions. 22 skills covered, 722 total stimuli: - 458 positive triggers (skill-invocation required) - 264 negative triggers (skill-invocation disallowed) Merged Waza YAML prompts for azure-deploy, azure-prepare, and azure-enterprise-infra-planner (deduplicated). Tags: type=trigger, polarity=positive/negative, tier=full, cost=free, area=routing Config: runs=3, timeout=120, model=claude-sonnet-4, threshold=0.8 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Correct model identifiers, grader types, tag format, and scoring weights per verified patterns from PR #1912 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Change cost: free to cost: llm for copilot-sdk trigger stimuli - Add scope note for Foundry sub-skills deferral - Fix README: @microsoft/vally -> @microsoft/vally-cli, distinct mock/copilot-sdk commands - Add scoping comment for nodejs_entry_point grader Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR migrates trigger routing test coverage for 22 skills from the Jest-based approach to Vally eval specs, by adding per-skill triggers.eval.yaml files containing positive/negative routing stimuli and a consistent grader pattern.

Changes:

Added 22 new evals/*/triggers.eval.yaml specs with positive (must invoke skill) and negative (must not invoke skill) prompts.
Standardized routing graders across the new specs (skill-invocation, completed, and a fatal-error guard via output-not-matches).
Removed Vally-related npm scripts from tests/package.json (leaving only Jest/lint/typecheck scripts).

Show a summary per file

File	Description
tests/package.json	Removes Vally `eval` / `eval:lint` scripts from the tests package scripts.
evals/appinsights-instrumentation/triggers.eval.yaml	Adds Vally trigger routing stimuli for `appinsights-instrumentation`.
evals/azure-ai/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-ai`.
evals/azure-aigateway/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-aigateway`.
evals/azure-cloud-migrate/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-cloud-migrate`.
evals/azure-compliance/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-compliance`.
evals/azure-compute/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-compute`.
evals/azure-cost/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-cost`.
evals/azure-deploy/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-deploy`.
evals/azure-diagnostics/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-diagnostics`.
evals/azure-enterprise-infra-planner/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-enterprise-infra-planner`.
evals/azure-hosted-copilot-sdk/triggers.eval.yaml	Adds/extends Vally trigger routing stimuli for `azure-hosted-copilot-sdk`.
evals/azure-kubernetes/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-kubernetes`.
evals/azure-messaging/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-messaging`.
evals/azure-prepare/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-prepare`.
evals/azure-quotas/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-quotas`.
evals/azure-rbac/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-rbac`.
evals/azure-resource-lookup/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-resource-lookup`.
evals/azure-resource-visualizer/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-resource-visualizer`.
evals/azure-upgrade/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-upgrade`.
evals/azure-validate/triggers.eval.yaml	Adds Vally trigger routing stimuli for `azure-validate`.
evals/entra-app-registration/triggers.eval.yaml	Adds Vally trigger routing stimuli for `entra-app-registration`.
evals/microsoft-foundry/triggers.eval.yaml	Adds Vally trigger routing stimuli for `microsoft-foundry`.

Copilot's findings

Files reviewed: 23/23 changed files
Comments generated: 1

Address PR #2282 review feedback. Update note to recommend npx @microsoft/vally-cli exclusively (matching .github/workflows/eval.yml) since the eval / eval:lint npm scripts were removed during Vally migration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-15T22:50:22Z

Details

# 🔍 Token Analysis Report

@github-copilot-for-azure/scripts@1.0.0 tokens
node --import tsx src/tokens/cli.ts compare --base origin/main --head HEAD --markdown

📊 Token Change Report

Comparing origin/main → HEAD

Summary

Metric	Value
📈 Total Change	+3 tokens (0%)
Before	1,465 tokens
After	1,468 tokens
Files Changed	1

Changed Files

File	Before	After	Change
`tests/azure-prepare/eval/README.md`	1,465	1,468	+3 (0%)

@github-copilot-for-azure/scripts@1.0.0 tokens
node --import tsx src/tokens/cli.ts check --markdown

📊 Token Limit Check Report

Checked: 644 files
Exceeded: 94 files

⚠️ Files Exceeding Token Limits

File	Tokens	Limit	Over By
`.github/skills/analyze-skill-issues/SKILL.md`	2109	500	+1609
`.github/skills/analyze-test-run/SKILL.md`	2471	500	+1971
`.github/skills/file-test-bug/SKILL.md`	628	500	+128
`.github/skills/sensei/README.md`	3531	2000	+1531
`.github/skills/sensei/SKILL.md`	3026	500	+2526
`.github/skills/sensei/references/EXAMPLES.md`	3701	2000	+1701
`.github/skills/sensei/references/LOOP.md`	4181	2000	+2181
`.github/skills/sensei/references/SCORING.md`	4299	2000	+2299
`.github/skills/skill-authoring/SKILL.md`	839	500	+339
`plugin/skills/airunway-aks-setup/SKILL.md`	1025	500	+525
`plugin/skills/appinsights-instrumentation/SKILL.md`	937	500	+437
`plugin/skills/azure-ai/SKILL.md`	820	500	+320
`plugin/skills/azure-aigateway/SKILL.md`	1261	500	+761
`plugin/skills/azure-aigateway/references/policies.md`	2342	2000	+342
`plugin/skills/azure-cloud-migrate/SKILL.md`	1085	500	+585
`plugin/skills/azure-cloud-migrate/references/services/container-apps/cloudrun-deployment-guide.md`	2029	2000	+29
`plugin/skills/azure-cloud-migrate/references/services/container-apps/deployment-guide.md`	2458	2000	+458
`plugin/skills/azure-cloud-migrate/references/services/container-apps/fargate-deployment-guide.md`	2587	2000	+587
`plugin/skills/azure-cloud-migrate/references/services/container-apps/spring-deployment-guide.md`	3871	2000	+1871
`plugin/skills/azure-cloud-migrate/references/services/functions/lambda-to-functions.md`	2600	2000	+600
`plugin/skills/azure-cloud-migrate/references/services/functions/runtimes/javascript.md`	2181	2000	+181
`plugin/skills/azure-compliance/SKILL.md`	1188	500	+688
`plugin/skills/azure-compute/SKILL.md`	1370	500	+870
`plugin/skills/azure-compute/workflows/essential-machine-management/references/emm-enable-flow.md`	2344	2000	+344
`plugin/skills/azure-compute/workflows/vm-recommender/vm-recommender.md`	2631	2000	+631
`plugin/skills/azure-compute/workflows/vm-troubleshooter/vm-troubleshooter.md`	2509	2000	+509
`plugin/skills/azure-cost/SKILL.md`	1980	500	+1480
`plugin/skills/azure-deploy/SKILL.md`	1645	500	+1145
`plugin/skills/azure-deploy/references/pre-deploy-checklist.md`	4692	2000	+2692
`plugin/skills/azure-deploy/references/recipes/azd/errors.md`	4004	2000	+2004
`plugin/skills/azure-deploy/references/troubleshooting.md`	2038	2000	+38
`plugin/skills/azure-diagnostics/SKILL.md`	1423	500	+923
`plugin/skills/azure-enterprise-infra-planner/SKILL.md`	1002	500	+502
`plugin/skills/azure-enterprise-infra-planner/references/constraints/compute-apps.md`	2022	2000	+22
`plugin/skills/azure-hosted-copilot-sdk/SKILL.md`	1332	500	+832
`plugin/skills/azure-kubernetes/SKILL.md`	2606	500	+2106
`plugin/skills/azure-kubernetes/azure-kubernetes-automatic-readiness/SKILL.md`	3609	500	+3109
`plugin/skills/azure-kusto/SKILL.md`	2152	500	+1652
`plugin/skills/azure-messaging/SKILL.md`	821	500	+321
`plugin/skills/azure-prepare/SKILL.md`	3375	500	+2875
`plugin/skills/azure-prepare/references/aspire.md`	4617	2000	+2617
`plugin/skills/azure-prepare/references/plan-template.md`	2617	2000	+617
`plugin/skills/azure-prepare/references/recipes/azd/aspire.md`	2275	2000	+275
`plugin/skills/azure-prepare/references/recipes/azd/terraform.md`	3555	2000	+1555
`plugin/skills/azure-prepare/references/research.md`	2274	2000	+274
`plugin/skills/azure-prepare/references/resources-limits-quotas.md`	3322	2000	+1322
`plugin/skills/azure-prepare/references/security.md`	2147	2000	+147
`plugin/skills/azure-prepare/references/services/functions/bicep.md`	3127	2000	+1127
`plugin/skills/azure-prepare/references/services/functions/templates/recipes/composition.md`	2813	2000	+813
`plugin/skills/azure-prepare/references/services/functions/terraform.md`	3404	2000	+1404
`plugin/skills/azure-prepare/references/services/sql-database/bicep.md`	2037	2000	+37
`plugin/skills/azure-quotas/SKILL.md`	2821	500	+2321
`plugin/skills/azure-quotas/references/commands.md`	2644	2000	+644
`plugin/skills/azure-reliability/SKILL.md`	5659	500	+5159
`plugin/skills/azure-reliability/references/configure-multi-region.md`	4729	2000	+2729
`plugin/skills/azure-resource-lookup/SKILL.md`	1394	500	+894
`plugin/skills/azure-resource-visualizer/SKILL.md`	2122	500	+1622
`plugin/skills/azure-storage/SKILL.md`	1228	500	+728
`plugin/skills/azure-upgrade/SKILL.md`	1542	500	+1042
`plugin/skills/azure-upgrade/references/languages/java/INSTRUCTION.md`	2724	2000	+724
`plugin/skills/azure-upgrade/references/languages/java/package-specific/com.microsoft.azure.management.md`	2215	2000	+215
`plugin/skills/azure-upgrade/references/languages/java/templates/PLAN_TEMPLATE.md`	2411	2000	+411
`plugin/skills/azure-upgrade/references/languages/java/templates/PROGRESS_TEMPLATE.md`	2315	2000	+315
`plugin/skills/azure-upgrade/references/languages/java/templates/SUMMARY_TEMPLATE.md`	2190	2000	+190
`plugin/skills/azure-upgrade/references/services/functions/automation.md`	3463	2000	+1463
`plugin/skills/azure-upgrade/references/services/functions/consumption-to-flex.md`	2773	2000	+773
`plugin/skills/azure-validate/SKILL.md`	950	500	+450
`plugin/skills/entra-agent-id/SKILL.md`	4001	500	+3501
`plugin/skills/entra-app-registration/SKILL.md`	2070	500	+1570
`plugin/skills/entra-app-registration/references/api-permissions.md`	2545	2000	+545
`plugin/skills/entra-app-registration/references/cli-commands.md`	2211	2000	+211
`plugin/skills/entra-app-registration/references/console-app-example.md`	2752	2000	+752
`plugin/skills/entra-app-registration/references/oauth-flows.md`	2375	2000	+375
`plugin/skills/microsoft-foundry/SKILL.md`	3955	500	+3455
`plugin/skills/microsoft-foundry/foundry-agent/create/create-hosted.md`	4824	2000	+2824
`plugin/skills/microsoft-foundry/foundry-agent/deploy/deploy.md`	6203	2000	+4203
`plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/eval-datasets.md`	2494	2000	+494
`plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/references/generate-seed-dataset.md`	2088	2000	+88
`plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/references/trace-to-dataset.md`	4325	2000	+2325
`plugin/skills/microsoft-foundry/foundry-agent/faos-optimize/faos-optimize.md`	3436	2000	+1436
`plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md`	3190	2000	+1190
`plugin/skills/microsoft-foundry/foundry-agent/observe/references/continuous-eval.md`	3860	2000	+1860
`plugin/skills/microsoft-foundry/foundry-agent/observe/references/deploy-and-setup.md`	2072	2000	+72
`plugin/skills/microsoft-foundry/foundry-agent/trace/references/kql-templates.md`	2701	2000	+701
`plugin/skills/microsoft-foundry/models/deploy-model/SKILL.md`	1640	500	+1140
`plugin/skills/microsoft-foundry/models/deploy-model/capacity/SKILL.md`	1739	500	+1239
`plugin/skills/microsoft-foundry/models/deploy-model/customize/SKILL.md`	2235	500	+1735
`plugin/skills/microsoft-foundry/models/deploy-model/customize/references/customize-workflow.md`	3335	2000	+1335
`plugin/skills/microsoft-foundry/models/deploy-model/preset/SKILL.md`	1226	500	+726
`plugin/skills/microsoft-foundry/models/deploy-model/preset/references/preset-workflow.md`	5534	2000	+3534
`plugin/skills/microsoft-foundry/quota/quota.md`	2288	2000	+288
`plugin/skills/microsoft-foundry/quota/references/capacity-planning.md`	2080	2000	+80
`plugin/skills/microsoft-foundry/references/agent-metadata-contract.md`	2373	2000	+373
`plugin/skills/microsoft-foundry/references/sdk/foundry-sdk-py.md`	2162	2000	+162

Consider moving content to references/ subdirectories.

Automated token analysis. See skill authoring guidelines for best practices.

jongio

Verified all 22 eval configs structurally - skill paths, grader configs (required/disallowed), and positive/negative trigger balance all check out. Consistent config across all files. CI green. Clean migration.

wbreza and others added 9 commits May 15, 2026 15:26

fix: correct vally-cli package name in CI and sync package-lock.json

2fb1334

- Install @microsoft/vally-cli (not @microsoft/vally) for CLI executable - Run npm install to sync package-lock.json with package.json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: apply validated Vally eval patterns

ebc0207

- Correct model identifiers, grader types, tag format, and scoring weights per verified patterns from PR #1912 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

wbreza requested review from JasonYeMSFT, Copilot and jongio May 15, 2026 22:27

Copilot started reviewing on behalf of wbreza May 15, 2026 22:29 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread tests/package.json

jongio reviewed May 16, 2026

View reviewed changes

wbreza requested a review from jongio May 18, 2026 23:32

github-actions Bot mentioned this pull request May 21, 2026

[repo-status] Weekly Repo Status — May 15–21, 2026 #2358

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate trigger tests to Vally eval configs (22 skills, 722 stimuli)#2282

feat: migrate trigger tests to Vally eval configs (22 skills, 722 stimuli)#2282
wbreza wants to merge 10 commits into
mainfrom
feature/trigger-test-migration

wbreza commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

📊 Token Change Report

Summary

Changed Files

📊 Token Limit Check Report

⚠️ Files Exceeding Token Limits

Uh oh!

jongio left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wbreza commented May 15, 2026

Summary

Skills Migrated

Changes

Eval Patterns

Dependencies

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

📊 Token Change Report

Summary

Changed Files

📊 Token Limit Check Report

⚠️ Files Exceeding Token Limits

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants