Clarify FAOS optimization guidance and improve evaluator contracts#2286
Open
XOEEst wants to merge 15 commits into
Open
Clarify FAOS optimization guidance and improve evaluator contracts#2286XOEEst wants to merge 15 commits into
XOEEst wants to merge 15 commits into
Conversation
Address PR microsoft#2174 review comments by expanding FAOS as Foundry Agent Optimization Service on first use and making the Step 8 Python config snippet copy-paste safe. The minimum contract example now imports os and preserves the app's existing model-selection fallback instead of hard-coding MODEL_DEPLOYMENT_NAME unless that is already what the app uses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # plugin/skills/microsoft-foundry/SKILL.md
When creating custom evaluator prompts, treat the runtime-enforced JSON schema as authoritative: result plus reason. Preserve user-provided rubric text, but remove or normalize conflicting output schemas such as score/reasoning or duplicate OUTPUT FORMAT blocks before calling evaluator_catalog_create. Add observe skill test coverage for the promptText guardrail so future edits keep the result/reason contract visible. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the 0-to-1 scoringType/minScore/maxScore details from the sample custom evaluator prompt. The important guardrail is the output contract: preserve the rubric, but avoid conflicting output schemas because the runtime enforces result/reason. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore plugin/skills/microsoft-foundry/SKILL.md from upstream/main so this PR no longer carries a line-ending-only change for the top-level skill file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the microsoft-foundry skill documentation to enforce stricter post-deploy completion criteria (invocation + evaluation suite generation), expand guidance for auto-generating and persisting evaluation-suite artifacts, and refine FAOS optimization/runtime configuration guidance.
Changes:
- Makes deployment “done” contingent on both a successful invocation smoke test and a completed (or explicitly documented fallback) evaluation suite generation step.
- Adds/expands documentation for evaluation suite generation, artifact caching, metadata schema updates, and evaluation run completion criteria.
- Refines FAOS optimization guidance (skills loading, tool description patching, and
OPTIMIZATION_*env var conventions) and broadens dataset lifecycle docs to include generation jobs.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/skills/microsoft-foundry/SKILL.md | Updates lifecycle routing table to include eval-suite setup after deploy/invoke. |
| plugin/skills/microsoft-foundry/references/agent-metadata-contract.md | Extends metadata contract/examples for suite references and generation provenance fields. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/evaluation-suite-generation.md | Adds a new end-to-end reference for suite generation, polling, and caching artifacts. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/evaluate-step.md | Tightens “evaluation run” completion criteria and clarifies agent-target batch eval inputs. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/deploy-and-setup.md | Reworks Step 1 to focus on auto-generating and persisting evaluation suites. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/compare-iterate.md | Aligns re-evaluation guidance with agent-target batch eval and suite metadata usage. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/analyze-results.md | Updates observability rule reference for knowledge-cutoff mitigation guidance. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md | Updates observe workflow framing to prioritize suite generation and agent-target runs. |
| plugin/skills/microsoft-foundry/foundry-agent/faos-optimize/references/python-patterns.md | Adds concrete Python patterns for skills loading and safe tool description patching; updates env var guidance. |
| plugin/skills/microsoft-foundry/foundry-agent/faos-optimize/faos-optimize.md | Updates FAOS contract guidance for OPTIMIZATION_* variables and local candidate directory support. |
| plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/references/generate-seed-dataset.md | Positions manual seed datasets as fallback to suite/data generation APIs; marks generationSource in metadata example. |
| plugin/skills/microsoft-foundry/foundry-agent/eval-datasets/eval-datasets.md | Expands dataset lifecycle docs to include generation jobs and suite tooling integration. |
| plugin/skills/microsoft-foundry/foundry-agent/deploy/deploy.md | Enforces an explicit post-deploy Step 8/5 gate for suite generation and artifact persistence before reporting success. |
| package.json | Reformats JSON (no functional change). |
Comments suppressed due to low confidence (1)
plugin/skills/microsoft-foundry/references/agent-metadata-contract.md:67
- In the example generated suite entry,
datasetFilestill points to a local.jsonl. The suite-generation caching instructions added in this PR require writing a dataset reference stub (*.ref.json) plus downloading the actual dataset blobs into a versioned folder. Updating the example to reflect the new caching contract would avoid agents persisting the wrong artifact path.
purpose: baseline
stage: seed
dataset: support-agent-dev-eval-seed
datasetVersion: v1
datasetFile: .foundry/datasets/support-agent-dev-eval-seed-v1.jsonl
datasetUri: <foundry-dataset-uri>
This was referenced May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request significantly updates the Microsoft Foundry agent deployment and evaluation documentation to enforce stricter deployment completion criteria, clarify the evaluation suite auto-generation process, and improve guidance for dataset management. The changes ensure that deployments are not considered complete until both an invocation test and an auto-generated evaluation suite are finished, and they provide detailed instructions for handling evaluation suite creation, manual fallbacks, and artifact persistence.
Deployment Workflow Enforcement and Evaluation Suite Generation:
Evaluation Dataset and Tooling Updates:
Terminology and Clarity Improvements:
Summary of Most Important Changes:
Deployment Workflow and Completion Criteria
Evaluation Suite Generation and Fallbacks
Artifact and Metadata Management
.foundry/agent-metadata.yamland local cache directories, and how to handle legacy metadata formats.Evaluation Dataset Documentation
Terminology and Clarity
These changes collectively improve the reliability, transparency, and reproducibility of agent deployments and evaluations in Microsoft Foundry.## Description
Checklist
cd tests && npm test)npm run test:skills:integration -- <skill>)USE FOR/DO NOT USE FOR/PREFER OVERclauses: confirmed no routing regressions for competing skillsRelated Issues