Add azure-reliability — Azure App Service reliability reference#2242
Add azure-reliability — Azure App Service reliability reference#2242MadhuraBharadwaj-MSFT wants to merge 47 commits into
Conversation
Adds a new azure-reliability skill that assesses Azure PaaS apps (Functions, Container Apps, App Service) for zone redundancy, storage replication, multi-region, and health probes. Reports findings as an enabled/disabled checklist (no numeric scoring) and supports both live (CLI) and IaC (Bicep/Terraform) remediation paths.
…ult SKU, deploy order) - Note AVM module param naming differs from raw Bicep (skuName vs sku.name); detect with Select-String and patch the actual param in use. - Annotate FC1/Consumption health probe as 'code-only fix' in the checklist and risk table; do not patch healthCheckPath in IaC for these plans. - Switch all 'az graph query' examples to '--query data[] -o json' (table output only shows summary cols). - Handle no-SKU storage case: ARM/AVM defaults to Standard_GRS; add explicit sku/skuName instead of find-and-replace. - Recommend splitting deploys: safe patches first, then storage migration, then storage SKU patch (a failed redundancy update can fail the whole deploy).
…gger on FC1/Consumption Adding a /api/health endpoint on Flex Consumption / Consumption Function Apps means modifying app source code (new HTTP trigger), not a Bicep/Terraform patch. Add an explicit STOP gate in configure-health-probes.md, the SKILL.md risk table, and both IaC patching references so the agent must ask the user before touching code, and respects 'no' by leaving everything unchanged.
…and multi-region
Workflow now sequences remediation as: easy ZR + health probes first (CLI or 'Deploy 1' for IaC), then explicitly ASK the user before kicking off the slow storage migration ('to be fully ZR you also need ZRS storage - want to do it?'). After the storage step (or skip), re-assess, then a new Configuration Workflow Step 3 asks about multi-region failover and waits for yes/no/later before generating any Front Door IaC. Phase 3 UX note now defers to Step 3 instead of duplicating the offer.
Replace per-resource checklist (with mostly n/a cells and mixed symbols) with a feature-pivoted table: 4 rows (Zone redundancy compute, Zone-redundant storage, Health probes, Multi-region failover), each with a single status (🟢 ON / 🟡 PARTIAL / 🔴 OFF or storage SKU) and a bullet list of relevant resources with inline reasons. Drops n/a noise; reasons sit on the resource line where users can see them. Re-Assess uses the same format with 'now ON' / 'still off' annotations. Step 3 prompt text aligned to use 🟢 instead of ✅.
…ser to run them Path B Deploy 1, Storage migration, Deploy 2, and Multi-region all now run the deploy commands (azd up / az deployment / terraform apply) directly after a single yes/no confirmation, then continue to re-assess. Updates Skill Boundaries and Integration tables: 'Deploy IaC for reliability changes: Yes' (was 'No, hand off to azure-deploy'). Bicep and Terraform patching references updated to summarize the deploy plan and ask 'Ready for Deploy 1?' instead of giving the user a list of commands to run.
Removed duplications: - 'When to Use This Skill' section (duplicate of Skill Activation Triggers) - 'HARD STOPS' block (each stop already inline at the right step) - Best Practices items that restated the workflow (kept only the 2 unique tips) - Path B Step 2 'Deploy-order rule' callout (Steps 3-5 already detail the flow) - Phase 3 UX note about multi-region (reduced to a one-liner pointing to Step 3) - Skill Boundaries 'IMPORTANT' header (duplicated description / Quick Reference) Also: - Quick Reference now says 'Reliability assessment table' instead of 'Reliability Checklist' for consistency with Phase 3 terminology. - Storage SKU row in Step 2 risk table now points to the two-deploy flow inline. - Fixed literal \\u escape sequences in SKILL.md, iac-patching-bicep.md, iac-patching-terraform.md that leaked from a previous edit (\\u2014 -> em-dash, \\u2192 -> arrow, \\u26a0\\ufe0f -> warning, \\u2705 -> checkmark, etc).
…ervice,functions} before team handoff Container Apps: - Standardize 'az graph query' to use --query 'data[]' -o json (table mode hides projected fields) - Fix Configure: Health Probes example (was mixing --set-env-vars + --yaml -<<EOF heredoc, which conflicts and isn't pwsh-portable); use a probes.yaml file instead - Add AVM modules note pointing to br/public:avm/res/app/managed-environment and avm/res/app/container-app param naming - Add STOP gate before adding /health route to container code (consent required, mirrors Functions FC1 pattern) - Add Reporting section showing how each Container Apps resource maps to feature-pivoted assessment rows App Service: - Standardize 'az graph query' to use --query 'data[]' -o json - Add ARR affinity / clientAffinityEnabled query (sticky sessions break ZR + multi-region) - Add new 'Configure: Disable Client Affinity (ARR Affinity)' section above slots; multi-region note now references it - Add AVM modules note pointing to br/public:avm/res/web/serverfarm and avm/res/web/site param naming - Add Reporting section, including PARTIAL state for multi-region with affinity still enabled Functions: - Add Reporting section so Functions matches the new convention Deferred for the App Service / Container Apps teams to verify against current Microsoft docs: - P1v3 minimum capacity for ZR (table currently says 3; docs may now say 2) - Standard tier health check support (table claims yes; verify) - Auto-Heal, backup/restore, VNet integration HA notes (out of scope for this pass)
…c content Make each services/<svc>/reliability.md the single source of truth for that service's plan/SKU rules, assessment queries, CLI commands, IaC patches (Bicep + Terraform + AVM), and reporting hints. Strip duplicated content from shared references so they only contain platform-level mechanics: - zone-redundancy-checks.md: keep platform overview + cross-service all-in-one query + AZ regions list. Drop per-service queries and remediation (already in services/). - configure-zone-redundancy.md: become a thin pointer to per-service refs + storage prerequisite + verification command. - configure-health-probes.md: become a thin pointer to per-service refs + cross-service consent gate + best practices. - iac-patching-bicep.md / iac-patching-terraform.md: keep framework (When to Use, Detection, AVM modules note, deploy plan) and the single cross-service Storage patch. Per-service patches now live in services/. - health-probe-checks.md: keep Front Door / Traffic Manager / App Insights checks + best practices + Multi-region row reporting. Drop per-service queries. - SKILL.md Phase 2 reworded as 'platform discovery + per-service deep dive'. Path A and Path B Step 3 now point to per-service refs for compute commands/patches. Net -889 lines (no content lost; the deleted lines were already duplicated in services/). This sets up a clean 3-PR split: PR #1 = shared platform refs + services/functions/, PR microsoft#2 = services/app-service/, PR microsoft#3 = services/container-apps/.
…ed-env warning (Container Apps) Two pieces of unique safety content from the deleted shared files were not yet captured in the per-service files. Adding them back: - Functions: 'Consumption (Y1) - upgrade path required' subsection with Flex vs Premium tradeoffs and cost warning (was in deleted configure-zone-redundancy.md). - Container Apps: explicit STOP before deleting old environment, with the 'az containerapp list --environment' check command (was in deleted configure-zone-redundancy.md).
Health Check is supported on Basic tier and above. It's actually configurable on free/shared but instances are not replaced. |
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Fixed error in description
Update test to validate support for App Service
Update test description and title to include App Service
Fixed spelling mistake - "Azure Azure App Service" to "Azure App Service"
Fix casing of content expected in test as test converts to lowercase, expected text should also be lowercase.
Fixed spelling mistake in expected description content
update the Jest snapshot to match the new azure-reliability skill metadata description.
Update test triggers
Update Test Snapshot for azure-reliability tests
There was a problem hiding this comment.
Pull request overview
Adds Azure App Service coverage to the azure-reliability skill by introducing an App Service-specific reliability reference and updating the skill dispatch/docs and tests accordingly.
Changes:
- Add a new App Service per-service reliability reference with assessment queries, CLI guidance, IaC patch examples, and reporting rules.
- Update
azure-reliabilityskill metadata and shared references to include App Service (and adjust scope notes/dispatch tables). - Expand unit/trigger/integration tests and snapshots to reflect the new App Service coverage.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/azure-reliability/unit.test.ts | Updates unit assertions to reflect App Service being in-scope. |
| tests/azure-reliability/triggers.test.ts | Adds App Service-oriented trigger prompts. |
| tests/azure-reliability/integration.test.ts | Adds an App Service E2E integration scenario. |
| tests/azure-reliability/snapshots/triggers.test.ts.snap | Updates trigger keyword/description snapshots for new scope wording. |
| plugin/skills/azure-reliability/SKILL.md | Expands skill scope to App Service and updates dispatch + guidance text. |
| plugin/skills/azure-reliability/references/zone-redundancy-checks.md | Adds App Service reference routing for zone redundancy discovery/reporting. |
| plugin/skills/azure-reliability/references/services/app-service/reliability.md | New App Service reliability reference (plans/SKUs, queries, CLI, IaC patches, reporting). |
| plugin/skills/azure-reliability/references/iac-patching-terraform.md | Links App Service per-service Terraform patch guidance. |
| plugin/skills/azure-reliability/references/iac-patching-bicep.md | Links App Service per-service Bicep patch guidance. |
| plugin/skills/azure-reliability/references/configure-zone-redundancy.md | Adds App Service entry to zone redundancy configuration index. |
| plugin/skills/azure-reliability/references/configure-health-probes.md | Adds App Service health probe mechanism/link and updates scope note. |
Comments suppressed due to low confidence (1)
plugin/skills/azure-reliability/references/iac-patching-terraform.md:39
- This note is now outdated: App Service Terraform patches are no longer “planned for a future version” since the App Service per-service reference is linked just above. Please update the note to only mention services that are still not shipped (e.g., Container Apps).
| Service | Reference |
|---|---|
| Azure App Service | [services/app-service/reliability.md](services/app-service/reliability.md) |
| Azure Functions | [services/functions/reliability.md](services/functions/reliability.md) |
> Azure App Service and Azure Container Apps per-service Terraform patches are planned for a future version of this skill.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
|
@copilot resolve the merge conflicts in this pull request |
Overview
Adds the App Service-specific reliability reference for the
azure-reliabilityskill introduced in #2241. Single-file change covering App Service plan/SKU support, assessment queries, CLI commands, IaC patches (Bicep + Terraform + AVM), and reporting hints for the assessment table.Companion PRs
Part of a 3-PR set introducing the
azure-reliabilityskill:azure-reliability/functions(introduces the skill:SKILL.md+ shared references + Functions service file)azure-reliability/container-appsWhat I'd love the App Service team to verify
The file lists items where current Microsoft docs should be checked:
✅; please confirm)