Skip to content

Add azure-reliability — Azure App Service reliability reference#2242

Open
MadhuraBharadwaj-MSFT wants to merge 47 commits into
microsoft:mainfrom
MadhuraBharadwaj-MSFT:azure-reliability/app-service
Open

Add azure-reliability — Azure App Service reliability reference#2242
MadhuraBharadwaj-MSFT wants to merge 47 commits into
microsoft:mainfrom
MadhuraBharadwaj-MSFT:azure-reliability/app-service

Conversation

@MadhuraBharadwaj-MSFT
Copy link
Copy Markdown
Collaborator

@MadhuraBharadwaj-MSFT MadhuraBharadwaj-MSFT commented May 12, 2026

⚠️ DRAFT — depends on #2241 (Functions + skill core). Will be marked ready for review once #2241 merges.

While both PRs are open, this PR's diff includes #2241's files because this branch is stacked on top of it. After #2241 merges into main, this PR's diff will automatically shrink to a single file.

📄 The single new file in this PR (the only thing for review here):
services/app-service/reliability.md

📁 Browse the full skill folder for context

Overview

Adds the App Service-specific reliability reference for the azure-reliability skill introduced in #2241. Single-file change covering App Service plan/SKU support, assessment queries, CLI commands, IaC patches (Bicep + Terraform + AVM), and reporting hints for the assessment table.

Companion PRs

Part of a 3-PR set introducing the azure-reliability skill:

What I'd love the App Service team to verify

The file lists items where current Microsoft docs should be checked:

  • Standard tier health check support (table currently says ; please confirm)
  • P1v3 minimum capacity for ZR (table says 3; docs may now say 2)
  • Auto-Heal, backup/restore, and VNet integration HA notes (not yet covered — should they be?)
  • AVM module parameter mapping in the IaC Patching section

MadhuraBharadwaj-MSFT and others added 14 commits May 11, 2026 16:01
Adds a new azure-reliability skill that assesses Azure PaaS apps (Functions, Container Apps, App Service) for zone redundancy, storage replication, multi-region, and health probes. Reports findings as an enabled/disabled checklist (no numeric scoring) and supports both live (CLI) and IaC (Bicep/Terraform) remediation paths.
…ult SKU, deploy order)

- Note AVM module param naming differs from raw Bicep (skuName vs sku.name); detect with Select-String and patch the actual param in use.
- Annotate FC1/Consumption health probe as 'code-only fix' in the checklist and risk table; do not patch healthCheckPath in IaC for these plans.
- Switch all 'az graph query' examples to '--query data[] -o json' (table output only shows summary cols).
- Handle no-SKU storage case: ARM/AVM defaults to Standard_GRS; add explicit sku/skuName instead of find-and-replace.
- Recommend splitting deploys: safe patches first, then storage migration, then storage SKU patch (a failed redundancy update can fail the whole deploy).
…gger on FC1/Consumption

Adding a /api/health endpoint on Flex Consumption / Consumption Function Apps means modifying app source code (new HTTP trigger), not a Bicep/Terraform patch. Add an explicit STOP gate in configure-health-probes.md, the SKILL.md risk table, and both IaC patching references so the agent must ask the user before touching code, and respects 'no' by leaving everything unchanged.
…and multi-region

Workflow now sequences remediation as: easy ZR + health probes first (CLI or 'Deploy 1' for IaC), then explicitly ASK the user before kicking off the slow storage migration ('to be fully ZR you also need ZRS storage - want to do it?'). After the storage step (or skip), re-assess, then a new Configuration Workflow Step 3 asks about multi-region failover and waits for yes/no/later before generating any Front Door IaC. Phase 3 UX note now defers to Step 3 instead of duplicating the offer.
Replace per-resource checklist (with mostly n/a cells and mixed symbols) with a feature-pivoted table: 4 rows (Zone redundancy compute, Zone-redundant storage, Health probes, Multi-region failover), each with a single status (🟢 ON / 🟡 PARTIAL / 🔴 OFF or storage SKU) and a bullet list of relevant resources with inline reasons. Drops n/a noise; reasons sit on the resource line where users can see them. Re-Assess uses the same format with 'now ON' / 'still off' annotations. Step 3 prompt text aligned to use 🟢 instead of ✅.
…ser to run them

Path B Deploy 1, Storage migration, Deploy 2, and Multi-region all now run the deploy commands (azd up / az deployment / terraform apply) directly after a single yes/no confirmation, then continue to re-assess. Updates Skill Boundaries and Integration tables: 'Deploy IaC for reliability changes: Yes' (was 'No, hand off to azure-deploy'). Bicep and Terraform patching references updated to summarize the deploy plan and ask 'Ready for Deploy 1?' instead of giving the user a list of commands to run.
Removed duplications:
- 'When to Use This Skill' section (duplicate of Skill Activation Triggers)
- 'HARD STOPS' block (each stop already inline at the right step)
- Best Practices items that restated the workflow (kept only the 2 unique tips)
- Path B Step 2 'Deploy-order rule' callout (Steps 3-5 already detail the flow)
- Phase 3 UX note about multi-region (reduced to a one-liner pointing to Step 3)
- Skill Boundaries 'IMPORTANT' header (duplicated description / Quick Reference)

Also:
- Quick Reference now says 'Reliability assessment table' instead of 'Reliability Checklist' for consistency with Phase 3 terminology.
- Storage SKU row in Step 2 risk table now points to the two-deploy flow inline.
- Fixed literal \\u escape sequences in SKILL.md, iac-patching-bicep.md, iac-patching-terraform.md that leaked from a previous edit (\\u2014 -> em-dash, \\u2192 -> arrow, \\u26a0\\ufe0f -> warning, \\u2705 -> checkmark, etc).
…ervice,functions} before team handoff

Container Apps:
- Standardize 'az graph query' to use --query 'data[]' -o json (table mode hides projected fields)
- Fix Configure: Health Probes example (was mixing --set-env-vars + --yaml -<<EOF heredoc, which conflicts and isn't pwsh-portable); use a probes.yaml file instead
- Add AVM modules note pointing to br/public:avm/res/app/managed-environment and avm/res/app/container-app param naming
- Add STOP gate before adding /health route to container code (consent required, mirrors Functions FC1 pattern)
- Add Reporting section showing how each Container Apps resource maps to feature-pivoted assessment rows

App Service:
- Standardize 'az graph query' to use --query 'data[]' -o json
- Add ARR affinity / clientAffinityEnabled query (sticky sessions break ZR + multi-region)
- Add new 'Configure: Disable Client Affinity (ARR Affinity)' section above slots; multi-region note now references it
- Add AVM modules note pointing to br/public:avm/res/web/serverfarm and avm/res/web/site param naming
- Add Reporting section, including PARTIAL state for multi-region with affinity still enabled

Functions:
- Add Reporting section so Functions matches the new convention

Deferred for the App Service / Container Apps teams to verify against current Microsoft docs:
- P1v3 minimum capacity for ZR (table currently says 3; docs may now say 2)
- Standard tier health check support (table claims yes; verify)
- Auto-Heal, backup/restore, VNet integration HA notes (out of scope for this pass)
…c content

Make each services/<svc>/reliability.md the single source of truth for that service's plan/SKU rules, assessment queries, CLI commands, IaC patches (Bicep + Terraform + AVM), and reporting hints. Strip duplicated content from shared references so they only contain platform-level mechanics:

- zone-redundancy-checks.md: keep platform overview + cross-service all-in-one query + AZ regions list. Drop per-service queries and remediation (already in services/).
- configure-zone-redundancy.md: become a thin pointer to per-service refs + storage prerequisite + verification command.
- configure-health-probes.md: become a thin pointer to per-service refs + cross-service consent gate + best practices.
- iac-patching-bicep.md / iac-patching-terraform.md: keep framework (When to Use, Detection, AVM modules note, deploy plan) and the single cross-service Storage patch. Per-service patches now live in services/.
- health-probe-checks.md: keep Front Door / Traffic Manager / App Insights checks + best practices + Multi-region row reporting. Drop per-service queries.
- SKILL.md Phase 2 reworded as 'platform discovery + per-service deep dive'. Path A and Path B Step 3 now point to per-service refs for compute commands/patches.

Net -889 lines (no content lost; the deleted lines were already duplicated in services/).

This sets up a clean 3-PR split: PR #1 = shared platform refs + services/functions/, PR microsoft#2 = services/app-service/, PR microsoft#3 = services/container-apps/.
…ed-env warning (Container Apps)

Two pieces of unique safety content from the deleted shared files were not yet captured in the per-service files. Adding them back:

- Functions: 'Consumption (Y1) - upgrade path required' subsection with Flex vs Premium tradeoffs and cost warning (was in deleted configure-zone-redundancy.md).
- Container Apps: explicit STOP before deleting old environment, with the 'az containerapp list --environment' check command (was in deleted configure-zone-redundancy.md).
Comment thread plugin/skills/azure-reliability/references/services/app-service/reliability.md Outdated
Comment thread plugin/skills/azure-reliability/references/services/app-service/reliability.md Outdated
Comment thread plugin/skills/azure-reliability/references/services/app-service/reliability.md Outdated
Comment thread plugin/skills/azure-reliability/references/configure-zone-redundancy.md Outdated
Comment thread plugin/skills/azure-reliability/references/services/app-service/reliability.md Outdated
Comment thread plugin/skills/azure-reliability/references/services/app-service/reliability.md Outdated
@apwestgarth
Copy link
Copy Markdown
Collaborator

⚠️ DRAFT — depends on #2241 (Functions + skill core). Will be marked ready for review once #X merges.
While both PRs are open, this PR's diff includes #2241's files because this branch is stacked on top of it. After #2241 merges into main, this PR's diff will automatically shrink to a single file.
📄 The single new file in this PR (the only thing for review here):
services/app-service/reliability.md
📁 Browse the full skill folder for context

Overview

Adds the App Service-specific reliability reference for the azure-reliability skill introduced in #2241. Single-file change covering App Service plan/SKU support, assessment queries, CLI commands, IaC patches (Bicep + Terraform + AVM), and reporting hints for the assessment table.

Companion PRs

Part of a 3-PR set introducing the azure-reliability skill:

What I'd love the App Service team to verify

The file lists items where current Microsoft docs should be checked:

  • Standard tier health check support (table currently says ; please confirm)
  • P1v3 minimum capacity for ZR (table says 3; docs may now say 2)
  • Auto-Heal, backup/restore, and VNet integration HA notes (not yet covered — should they be?)
  • AVM module parameter mapping in the IaC Patching section

Health Check is supported on Basic tier and above. It's actually configurable on free/shared but instances are not replaced.
Suggested a change to P0v3, and added Pv4 too. Minimum instance required now is 2.

MadhuraBharadwaj-MSFT and others added 5 commits May 13, 2026 10:54
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
Fixed error in description
Update test to validate support for App Service
Update test description and title to include App Service
Fixed spelling mistake - "Azure Azure App Service" to "Azure App Service"
Fix casing of content expected in test as test converts to lowercase, expected text should also be lowercase.
Fixed spelling mistake in expected description content
update the Jest snapshot to match the new azure-reliability skill metadata description.
Update test triggers
Update Test Snapshot for azure-reliability tests
@apwestgarth apwestgarth marked this pull request as ready for review May 21, 2026 18:17
@apwestgarth apwestgarth requested a review from saikoumudi as a code owner May 21, 2026 18:17
Copilot AI review requested due to automatic review settings May 21, 2026 18:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Azure App Service coverage to the azure-reliability skill by introducing an App Service-specific reliability reference and updating the skill dispatch/docs and tests accordingly.

Changes:

  • Add a new App Service per-service reliability reference with assessment queries, CLI guidance, IaC patch examples, and reporting rules.
  • Update azure-reliability skill metadata and shared references to include App Service (and adjust scope notes/dispatch tables).
  • Expand unit/trigger/integration tests and snapshots to reflect the new App Service coverage.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/azure-reliability/unit.test.ts Updates unit assertions to reflect App Service being in-scope.
tests/azure-reliability/triggers.test.ts Adds App Service-oriented trigger prompts.
tests/azure-reliability/integration.test.ts Adds an App Service E2E integration scenario.
tests/azure-reliability/snapshots/triggers.test.ts.snap Updates trigger keyword/description snapshots for new scope wording.
plugin/skills/azure-reliability/SKILL.md Expands skill scope to App Service and updates dispatch + guidance text.
plugin/skills/azure-reliability/references/zone-redundancy-checks.md Adds App Service reference routing for zone redundancy discovery/reporting.
plugin/skills/azure-reliability/references/services/app-service/reliability.md New App Service reliability reference (plans/SKUs, queries, CLI, IaC patches, reporting).
plugin/skills/azure-reliability/references/iac-patching-terraform.md Links App Service per-service Terraform patch guidance.
plugin/skills/azure-reliability/references/iac-patching-bicep.md Links App Service per-service Bicep patch guidance.
plugin/skills/azure-reliability/references/configure-zone-redundancy.md Adds App Service entry to zone redundancy configuration index.
plugin/skills/azure-reliability/references/configure-health-probes.md Adds App Service health probe mechanism/link and updates scope note.
Comments suppressed due to low confidence (1)

plugin/skills/azure-reliability/references/iac-patching-terraform.md:39

  • This note is now outdated: App Service Terraform patches are no longer “planned for a future version” since the App Service per-service reference is linked just above. Please update the note to only mention services that are still not shipped (e.g., Container Apps).
| Service | Reference |
|---|---|
| Azure App Service | [services/app-service/reliability.md](services/app-service/reliability.md) |
| Azure Functions | [services/functions/reliability.md](services/functions/reliability.md) |

> Azure App Service and Azure Container Apps per-service Terraform patches are planned for a future version of this skill.

Comment thread plugin/skills/azure-reliability/references/zone-redundancy-checks.md Outdated
Comment thread plugin/skills/azure-reliability/references/zone-redundancy-checks.md Outdated
Comment thread plugin/skills/azure-reliability/SKILL.md Outdated
Comment thread plugin/skills/azure-reliability/references/services/app-service/reliability.md Outdated
Comment thread tests/azure-reliability/integration.test.ts Outdated
Comment thread plugin/skills/azure-reliability/references/services/app-service/reliability.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Andrew Westgarth <mail@hawaythelads.co.uk>
@apwestgarth apwestgarth requested a review from RickWinter as a code owner May 21, 2026 18:32
apwestgarth
apwestgarth previously approved these changes May 21, 2026
@apwestgarth
Copy link
Copy Markdown
Collaborator

@copilot resolve the merge conflicts in this pull request

@apwestgarth apwestgarth reopened this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants