FAOS skill#2123
Conversation
There was a problem hiding this comment.
Pull request overview
Adds guidance for Foundry agent “optimization candidate search” runs by introducing a new reference doc and wiring it into the existing observe workflow docs.
Changes:
- Added a new reference doc describing inputs/guardrails/error handling for
agent_optimization_start-based optimization jobs. - Updated
observe.mdto listagent_optimization_*tools, add a new entry point, and add additional behavioral rules for optimization jobs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| plugin/skills/microsoft-foundry/foundry-agent/observe/references/optimize-candidate-jobs.md | New reference for running optimization candidate jobs via agent_optimization_start. |
| plugin/skills/microsoft-foundry/foundry-agent/observe/observe.md | Updates observe skill docs to include optimization-job tools, entry point, and guardrails. |
| After dataset creation, convert the result into the optimization input shape if using `datasetJson`: | ||
|
|
||
| ```json | ||
| [ | ||
| { | ||
| "query": "What is 12 + 7?", | ||
| "name": "math-basic-add", | ||
| "groundTruth": "19" | ||
| } | ||
| ] | ||
| ``` | ||
|
|
||
| Rules: | ||
|
|
||
| - `query` is required | ||
| - `name` is optional | ||
| - `groundTruth` is optional | ||
| - show the final dataset input to the user before continuing |
There was a problem hiding this comment.
The datasetJson example uses groundTruth (camelCase), but the rest of the Foundry agent dataset docs in this repo consistently use ground_truth (snake_case) (e.g., eval-datasets references and observe guidance). This inconsistency is likely to confuse users and could lead to passing the wrong field name. Align the field name with the established dataset schema, or explicitly document why optimization expects a different shape and how ground_truth/expected_behavior map to the optimization input.
There was a problem hiding this comment.
Confirmed - ground_truth (snake_case) appears in 7+ files across the foundry skills: eval-datasets references, evaluate-step.md, deploy-and-setup.md, and observe.md itself. groundTruth (camelCase) doesn't appear anywhere else in the repo. This should use ground_truth to match.
|
|
||
| ## Tools | ||
|
|
||
| - `agent_optimization_start`, `agent_optimization_get` |
There was a problem hiding this comment.
This reference is about running/monitoring optimization jobs, but the Tools section omits agent_optimization_list even though it’s listed as a key tool in the parent observe skill. Consider adding agent_optimization_list here (or calling out when to use it) so readers have a complete set of job-management commands in one place.
| - `agent_optimization_start`, `agent_optimization_get` | |
| - `agent_optimization_start`, `agent_optimization_get`, `agent_optimization_list` |
| |----------|-------| | ||
| | MCP server | `azure` | | ||
| | Key Foundry MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update` | | ||
| | Key Foundry MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update`, `agent_optimization_start`, `agent_optimization_get`, `agent_optimization_list` | |
There was a problem hiding this comment.
The Quick Reference now lists agent_optimization_* tools as key MCP tools, but the earlier “DO NOT manually call …” warning in this document doesn’t mention them. To keep the guardrails consistent for readers skimming the header, update that warning to include agent_optimization_start (and related agent_optimization_* calls) if they are also intended to be gated by this skill’s workflow.
jongio
left a comment
There was a problem hiding this comment.
The optimization candidate jobs reference covers inputs, invocation flow, and error handling well. Two gaps worth addressing:
-
Workflow stops at starting the job. The invocation flow (steps 1-8) ends at calling
agent_optimization_start, but there's no guidance on what comes next - polling withagent_optimization_get, interpreting results, or deciding whether to apply a candidate config. Compare with optimize-deploy.md, which has explicit "Deploy New Version" and "Next Steps" sections after the optimize call. -
Merge conflicts need resolution before this can merge.
Minor: the new entry point uses a + join pattern (optimize-deploy.md + optimize-candidate-jobs.md) while other entry points use "first...then" sequential routing. Not blocking, but matching the existing style would be clearer.
| 5. Ask the user to choose evaluators. | ||
| 6. If needed, create a custom evaluator with `evaluator_catalog_create`. | ||
| 7. Restate all final inputs: `agentName`, `projectEndpoint`, dataset mode, and evaluator list. | ||
| 8. Only then call `agent_optimization_start`. |
There was a problem hiding this comment.
The workflow ends here, but what happens after starting the job? Consider adding steps for:
- Using
agent_optimization_getto poll job status - Reviewing the returned candidate config/results
- Deciding whether to apply or discard
The existing optimize-deploy.md has a "Deploy New Version" + "Next Steps" pattern after prompt_optimize that works well as a model.
Description
Checklist
cd tests && npm test)npm run test:skills:integration -- <skill>)USE FOR/DO NOT USE FOR/PREFER OVERclauses: confirmed no routing regressions for competing skillsRelated Issues