Run comparison integration tests without skills#2183
Open
JasonYeMSFT wants to merge 4 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Extends the integration test workflows to support “comparison” runs that execute without loading skills (via NO_SKILLS), scheduled for Sundays.
Changes:
- Add
NO_SKILLSgating to skip skill assertions and skill-invocation test suites during comparison runs. - Add a Sunday schedule entry and propagate a
no-skillsinput/env var through GitHub Actions workflows. - Update PR validation logic to exclude the Sunday comparison schedule from “all skills covered” checks.
Show a summary per file
| File | Description |
|---|---|
| tests/utils/evaluate.ts | Skips softCheckSkill when NO_SKILLS=true. |
| tests/skills.json | Adds a Sunday schedule entry intended for comparison runs. |
| tests/azure-deploy/integration.test.ts | Skips skill-invocation describe block when NO_SKILLS=true. |
| .github/workflows/test-azure-deploy.yml | Adds no-skills input and sets NO_SKILLS env for Jest runs. |
| .github/workflows/test-all-integration.yml | Adds Sunday cron and wires through no-skills resolution and env. |
| .github/workflows/pr.yml | Limits schedule coverage validation to weekday cron entries only. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 4
jongio
approved these changes
May 16, 2026
Collaborator
jongio
left a comment
There was a problem hiding this comment.
Clean separation - dedicated workflow keeps comparison runs from triggering weekday automation. The NO_SKILLS gating is straightforward and doesn't affect normal test runs. LGTM.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Extend the integration test workflow to run comparison tests on Sunday. Unlike regular integration tests, these comparison tests don't load any skills. They are intended for comparing the behavior of the agent with/without our skills to help us guage if our skills are helpful and if they may cause the agent to use significantly more tokens.
Currently, the cron job config only runs azure-deploy tests for the comparison tests.
Checklist
cd tests && npm test)npm run test:skills:integration -- <skill>)USE FOR/DO NOT USE FOR/PREFER OVERclauses: confirmed no routing regressions for competing skillsRelated Issues