feat(azure-compute): add VM creation workflow with approval gate before deploy#2297
feat(azure-compute): add VM creation workflow with approval gate before deploy#2297deankroker wants to merge 13 commits into
Conversation
…der token limits
Adds the vm-creator workflow (Plan-Card-gated VM/VMSS provisioning with
Azure MCP, az CLI, Bicep, and Terraform output adapters) and refactors
the existing azure-compute skill files to stay well under CI token limits:
- SKILL.md 1,840 -> 382 tokens
- vm-recommender.md 3,006 -> 1,573 tokens (extracted handoff-to-creator
and web-fetch-policy refs)
- vm-creator.md new; 1,676 tokens
vm-creator splits its larger references into sub-folders so every file
sits under the 1,000-token soft limit:
- references/depth-probe/ (6 files, max 755t)
- references/output-adapters/ (5 files, max 760t)
- references/delivery-options/ (4 files, max 817t)
- references/{plan-card,validation-gates,mcp-tools}.md (3 standalone)
Bicep and Terraform templates moved to examples/ as code files so they
don't count against markdown token budgets. The skill now consumes the
four new compute discovery tools shipping in the paired mcp PR
(vm list-skus, vm list-images, vm check-quota, vm recommend-region) as
its Step 4 validation gates.
Zero hard-limit violations and zero soft warnings on any file in this PR.
The bundled Azure MCP server registers one tool per service namespace by default (~26 tools: compute, storage, keyvault, monitor, sql, ...). Combined with the azure-compute skill's loaded reference files, the total context routinely crosses Claude Code's autocompact threshold within 3 turns, causing thrash errors during vm-creator flows. Passing --mode single collapses all namespaces into one routing tool (per the upstream @azure/mcp CLI). Schema cost drops from ~30K to ~3K tokens with zero functionality loss — service routing happens inside the single tool via its command arg. Verified: echo "create a vm" | claude --plugin-dir output --print - before: autocompact thrash after 3 turns - after: 24.9s, 145K input tokens, full Plan Card rendered
… action picker
The old approve → output-format → delivery flow was 3 sequential popups for
what's effectively one decision ("how do you want this delivered?"). This
trims plan-card.md to specify a single AskUserQuestion with 4 likely
combinations, an explicit "Edit a row" branch, and an early-skip when the
user's original prompt already implied the delivery choice.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…SS, plan card, adapters 16 stimuli across the azure-compute skill surface area: - Router accuracy (vm-creator vs vm-recommender vs azure-prepare handoff) - Depth-probe branch selection (beginner / spec / cost / networking / security) - VMSS recognition without the user saying "VMSS" (agent pools, runner fleets) - Plan Card gate (no artifact emitted before approval, even under pressure) - Output adapter coverage (az CLI / Bicep / Terraform / MCP apply) - Security posture (no open SSH/RDP, no plaintext passwords, managed identity preferred) Smoke tier (5 stimuli, --tag tier=smoke) currently passes 15/15 trials with zero flakes against claude-sonnet-4.6. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR expands the azure-compute skill from recommendation/troubleshooting routing into a fuller VM/VMSS creation workflow, adds MCP single-tool mode context optimization, and introduces an eval suite to guard routing and Plan Card behavior.
Changes:
- Adds a new
vm-creatorworkflow with depth probes, validation gates, Plan Card approval, output adapters, delivery options, and example Bicep/Terraform artifacts. - Updates the existing VM recommender and main skill router to hand off provisioning requests to
vm-creator. - Adds Azure Compute Vally eval coverage and switches Azure MCP startup to
--mode single.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
plugin/skills/azure-compute/SKILL.md |
Updates router description and adds vm-creator routing. |
plugin/.mcp.json |
Starts Azure MCP in single-tool mode. |
evals/azure-compute/eval.yaml |
Adds azure-compute routing/workflow eval suite. |
plugin/skills/azure-compute/workflows/vm-recommender/vm-recommender.md |
Trims recommender workflow and adds creator handoff. |
plugin/skills/azure-compute/workflows/vm-recommender/references/web-fetch-policy.md |
Adds live-doc verification fallback policy. |
plugin/skills/azure-compute/workflows/vm-recommender/references/handoff-to-creator.md |
Defines recommender-to-creator handoff rules. |
plugin/skills/azure-compute/workflows/vm-creator/vm-creator.md |
Adds top-level VM/VMSS creation workflow. |
plugin/skills/azure-compute/workflows/vm-creator/references/validation-gates.md |
Documents required pre-Plan Card validation checks. |
plugin/skills/azure-compute/workflows/vm-creator/references/plan-card.md |
Defines Plan Card schema and approval picker. |
plugin/skills/azure-compute/workflows/vm-creator/references/mcp-tools.md |
Documents MCP commands and CLI fallbacks. |
plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/index.md |
Adds depth-probe branch selection. |
plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/beginner.md |
Adds beginner fast-path defaults. |
plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/cost-deep.md |
Adds cost-focused questioning guidance. |
plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/networking-deep.md |
Adds networking-focused questioning guidance. |
plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/security-deep.md |
Adds security-focused questioning guidance. |
plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/spec-deep.md |
Adds SKU/spec-focused questioning guidance. |
plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/index.md |
Adds shared output adapter routing and field mapping. |
plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/az-cli.md |
Adds az CLI output adapter. |
plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/bicep.md |
Adds Bicep output adapter. |
plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/terraform.md |
Adds Terraform output adapter. |
plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/mcp-apply.md |
Adds live Azure MCP apply adapter. |
plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/index.md |
Adds delivery mode decision logic. |
plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/print.md |
Adds print-in-chat delivery mode. |
plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/save-local.md |
Adds local file save delivery mode. |
plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/github-pr.md |
Adds GitHub PR delivery mode. |
plugin/skills/azure-compute/workflows/vm-creator/examples/bicep/main.bicep |
Adds deployable Bicep VM example. |
plugin/skills/azure-compute/workflows/vm-creator/examples/bicep/README.md |
Adds Bicep example usage documentation. |
plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/main.tf |
Adds deployable Terraform VM example. |
plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/variables.tf |
Adds Terraform input variables. |
plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/outputs.tf |
Adds Terraform outputs. |
plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/README.md |
Adds Terraform example usage documentation. |
- Fix mcp-tool-references test by rephrasing line 38 of mcp-tools.md to
avoid the literal `azure__compute_vm_check-quota` regex match (the
prose was being parsed as a tool reference even though it was just
illustrating sub-command vs tool distinction)
- Update triggers.test.ts to reflect the broader azure-compute scope:
remove prompts from the no-trigger list that now legitimately trigger
due to generic compute keywords ("create"/"deploy"/"server") in the
expanded description; add comment explaining the router's
disambiguation rule forwards non-VM intents to the right skill
- Regenerate triggers snapshot for the expanded keyword set
- Address 7 Copilot review comments across bicep/terraform examples,
depth-probe, plan-card, vm-creator, vm-recommender, and adapter docs
…h format rule The frontmatter validator (description-format rule) rejects >- folded scalars because skills.sh in the runtime is incompatible with them. Convert the azure-compute SKILL.md description from a >- folded multi-line scalar to an inline double-quoted string. Parsed text is identical, so trigger snapshots and keyword extraction are unchanged. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…form examples Both vm-creator templates previously defaulted the SSH allow-rule source to '*', leaving port 22 open to the entire internet on any deployment that didn't override the parameter. An exposed SSH endpoint gets credential- stuffed within minutes of going public. Make the parameter required (no default) and document the security tradeoff on the description so callers must pass either their own /32 or a trusted CIDR before the template will deploy. Quickstart now derives MY_IP from ifconfig.me and passes it explicitly; cleanup command updated to match. Addresses Copilot review comments on PR microsoft#2297.
|
Pushed Fixed (
Templates now refuse to deploy without an explicit source prefix — the open-to-internet path requires the user to pass Audited, no change needed:
Watching CI; will flag if anything regresses. |
Azure MCP uses AzureCliCredential (not DefaultAzureCredential) to pick up the `az login` session. Addresses review comment from JasonYeMSFT on PR microsoft#2297.
Plugin-wide change should not ride on a single skill's PR. Splitting the context-savings work to a follow-up PR scoped to plugin config. Addresses review comment from JasonYeMSFT on PR microsoft#2297.
Per review feedback, omit the brittle `AzureCliCredential` reference from the MCP apply prerequisite. State the user-facing requirement (be signed in to Azure, e.g. `az login`) and treat the exact credential mechanism as an MCP server implementation detail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per JasonYeMSFT review on PR microsoft#2297 (thread 3270086905, eval.yaml:757): when the user combines an explicit deliverable ("give me the Bicep") with an explicit refusal of dialog ("no questions", "skip planning"), the vm-creator workflow should honor that intent rather than overrule it with a mandatory Plan Card approval popup. vm-creator Step 5 now has two paths: - Default: render the full Plan Card table + Approve/Edit AskUserQuestion - Explicit-override fast path: emit a one-line preview of high-signal decisions + the requested artifact, no popup. Step 4 validation gates (SKU / image / quota / region) still run. The plan-card-gate-001 eval stimulus is renamed to plan-card-override-001. Prompt is unchanged (same explicit override), graders flipped to verify the new contract: Bicep must emit directly, must name the SKU, must NOT render the full Plan Card markdown table, must NOT block on the Approve/Edit AskUserQuestion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…muli Per JasonYeMSFT review on PR microsoft#2297 (thread 3270067015, eval.yaml:91): output-text regex was a fragile proxy for routing validation. Migrate the four router smoke stimuli to the skill-invocation grader, which checks the actual trajectory's skill_activation events rather than scraping response text. Changes: - route-vm-creator-001: required=[azure-compute], disallowed=[azure-prepare] - route-vmss-001: required=[azure-compute] - route-vm-recommender-001: required=[azure-compute] - route-away-azure-prepare-001: disallowed=[azure-compute] (the anti-pattern is azure-compute silently provisioning a VM; both azure-prepare and azure-deploy are legitimate handoff targets) Content-checking graders (Ubuntu mention, VMSS substring, SKU patterns, pricing references, no-VM-artifact anti-pattern) are kept on output-text regex — those are behavioral assertions about response content, distinct from routing. Adds skill-invocation: 1.5 to scoring weights (mirrors azure-enterprise-infra-planner) so routing-miss failures are weighted appropriately. Verified end-to-end: smoke tier (5 stimuli, 1 run each) passes at 100% against claude-sonnet-4.6. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…from route-vmss-001 Per review feedback, output-not-matches graders are prone to false-positive failures because agent output is stochastic. Drop the open-management-port not-match grader from the route-vmss-001 case Jason flagged. The same risky-NSG behavior is covered elsewhere in the workflow via skill content and Plan Card rendering, so the eval signal is preserved without the brittle negative pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What this does
Before this PR, asking Copilot for Azure to "create a VM" jumped straight to generating Bicep or running
az vm create. The user had no chance to review what was about to be deployed, and for VM scale sets the model often missed the intent entirely.This PR adds a guided workflow so the same request now:
azCLI / MCP callsIt also recognizes VMSS intent without the user having to say "VMSS" (e.g. "fleet that autoscales" → Flexible orchestration scale set), and refuses to deploy templates that would leave SSH open to the internet.
What's in the PR
Workflow + supporting docs
57789bd— newvm-creatorworkflow underplugin/skills/azure-compute/, with depth-probe references (beginner / cost / networking / security / spec) so the question set adapts to user expertise. Output adapters forazCLI, Bicep, Terraform, and Azure MCP. Trimsvm-recommender.mdand pulls verboseweb_fetchrules into a reference file.a7c7954— Plan Card approval is now one batched picker (approve / edit / change format) instead of three sequential questions.Plumbing
e729b2f— switches the Azure MCP CLI to--mode single, collapsing ~26 per-service tool schemas (~30K tokens) behind one routing tool (~3K tokens). Same APIs, same auth — the schemas just don't load until the tool is called. This is what makes the workflow fit in context without truncating.Eval coverage
4592b1f—evals/azure-compute/eval.yamlwith 16 stimuli covering router accuracy, depth-probe branches, VMSS recognition, Plan Card gate, and output adapter coverage. Smoke tier (5 × 3) was 3/5 flaky before; this PR brings it to 5/5 passing 3-for-3.Follow-ups during review
710e3f7— fixes Skill Structure CI: a few cross-file references pointed at paths that didn't exist after the workflow split.a590be1— rewrites the skill frontmatterdescription:from a>-folded scalar to an inline double-quoted string.skills.shonly accepts the inline form.ef08203— removes the'*'default onsshSourceAddressPrefix(Bicep) andssh_source_address_prefix(Terraform). The templates now require the caller to pass their own IP or a trusted CIDR; opening port 22 to the entire internet has to be an explicit, accepted choice. Resolves the open Copilot security suggestions.Why it matters
The visible change: a user asking for a VM gets to see and edit what's about to be deployed, instead of finding out after the fact. The eval suite locks the behavior in so it doesn't regress.
The invisible change (
e729b2f): the whole workflow used to run out of context room around turn 3 because the Azure MCP server was loading every service's tool schema up front. With--mode single, schemas load on demand and the workflow holds end-to-end.Test plan
vally run evals/azure-compute/eval.yaml --tier smoke— 5/5 stimuli × 3 trialsvm-recommenderstandalone (recommend-only, no hand-off to creator). PASS (1 turn, $0.30)ef08203)Manual items run via the
claude-copilotproxy onclaude-sonnet-4.6with the plugin loaded from a working tree clone.