Skip to content

VED-1233: Replace manual release steps#1437

Open
Thomas-Boyle wants to merge 5 commits intomasterfrom
ved-1233-replace-manual-release-steps
Open

VED-1233: Replace manual release steps#1437
Thomas-Boyle wants to merge 5 commits intomasterfrom
ved-1233-replace-manual-release-steps

Conversation

@Thomas-Boyle
Copy link
Copy Markdown
Contributor

  • Added steps to set the Terraform workspace and manage shared Lambda triggers during blue/green deployments in the deploy-backend.yml workflow.
  • Introduced a new script, manage_blue_green_event_source_mappings.sh, to handle the preparation and cleanup of event source mappings for Lambda functions.
  • Updated README.md to document the new blue/green Lambda trigger handoff process, removing manual steps from the deployment flow.

Automated blue/green Lambda trigger handoff in deployment by adding pre-plan state adoption and pre-apply stale-trigger cleanup steps to the backend workflow.
Added manage_blue_green_event_source_mappings.sh to resolve live mapping UUIDs, re-import shared delta and id-sync event source mappings into the target Terraform workspace, and delete obsolete side-specific mappings.
This removes the manual release checklist steps (“Disable delta” and “Disable ID sync”), making releases faster and reducing risk of human error during blue/green cutovers.

- Added steps to set the Terraform workspace and manage shared Lambda triggers during blue/green deployments in the deploy-backend.yml workflow.
- Introduced a new script, manage_blue_green_event_source_mappings.sh, to handle the preparation and cleanup of event source mappings for Lambda functions.
- Updated README.md to document the new blue/green Lambda trigger handoff process, removing manual steps from the deployment flow.
@github-actions
Copy link
Copy Markdown
Contributor

This branch is working on a ticket in the NHS England VED JIRA Project. Here's a handy link to the ticket:

VED-1233

@Thomas-Boyle Thomas-Boyle temporarily deployed to internal-dev-sandbox April 22, 2026 10:30 — with GitHub Actions Inactive
@Thomas-Boyle Thomas-Boyle temporarily deployed to internal-dev-sandbox April 22, 2026 10:30 — with GitHub Actions Inactive
@Thomas-Boyle Thomas-Boyle temporarily deployed to internal-dev-sandbox April 22, 2026 10:31 — with GitHub Actions Inactive
@Thomas-Boyle Thomas-Boyle added feature New feature or request infrastructure Pull requests that update terraform code labels Apr 22, 2026
Comment thread utilities/scripts/manage_blue_green_event_source_mappings.sh Outdated
Comment thread utilities/scripts/manage_blue_green_event_source_mappings.sh Outdated
Comment thread utilities/scripts/manage_blue_green_event_source_mappings.sh Outdated
Comment thread utilities/scripts/manage_blue_green_event_source_mappings.sh Outdated
…event_source_mappings.sh

- Introduced a new delete_mapping function to handle the deletion of AWS Lambda event source mappings, including a timeout mechanism for deletion confirmation.
- Updated adopt_mapping function to utilize the new delete_mapping function, improving the logic for handling target and counterpart mapping UUIDs.
- Enhanced code clarity and maintainability by restructuring the mapping lookup and deletion process.
@@ -0,0 +1,170 @@
#!/usr/bin/env bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this script delete a live lambda event source mapping before terraform apply - outside any saved plan?
This should only be used for the controlled migration. If this script fails between the adopt and apply then state and/or AWS can get out of sync which will not be recorded in the artifact.
Can we move this to a dedicated migration workflow (or behind a one-time flag per env)?

working-directory: infrastructure/instance
run: make workspace

- name: Terraform Apply
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the caller of this workflow (continuous-deployment) has concurrency, but this workflow does not - any other caller or future release workflow can race on the shared trigger workspace for preprod and prod.
Should we add concurrency, maybe keyed by the shared scope?

apply: workspace
$(tf_cmd) apply $(tf_vars) --auto-approve

destroy: workspace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this command mean that a destroy will destroy the shared mapping set between blue and green envs?
Should we have an allow flag set for the shared-scope which would determine if the destroy passes or fails?

counterpart_id_sync_function="imms-${counterpart_sub_environment}-id-sync-lambda"
fi

adopt_mapping \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a validate + separate trigger-state plan artifact before terraform apply?maybe also a tflint andbasic policy pass?

local function_name="$2"
local mapping_uuid

mapping_uuid="$(aws lambda list-event-source-mappings \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be a problem if there is a failed cutover - would you ever duplicate, stale, disabled, or partially deleted mappings?
if so, the script should fail on ambiguity, ignore Deleting states, log the UUIDs and current states it found, and verify the final mapping target after apply

@@ -0,0 +1,4 @@
output "id_sync_queue_arn" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have outputs for both mapping UUIDs, function ARNs, and state?
Also, a documented rollback path and verification commands.

Comment thread infrastructure/instance/README.md Outdated

## Lambda Trigger Handoff

The `delta_trigger` and `id_sync_sqs_trigger` event source mappings are managed from `../event_source_mappings` so the main instance plan does not rewrite shared backend state. The deploy workflow applies the main instance first, then adopts or updates the trigger mappings from the dedicated trigger workspace.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It maybe worth adding a runbook for first cutover, rollback, and failed-apply recovery, including the exact AWS verification commands?

Copy link
Copy Markdown
Contributor

@avshetty1980 avshetty1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good - just a few questions and comments.

- Added a new workflow for migrating event source mappings, allowing controlled one-time migrations for specific environments.
- Updated the deploy-backend.yml workflow to include concurrency settings and additional steps for Terraform initialization, formatting, validation, and applying event source mappings.
- Refactored the Makefile to introduce new commands for formatting checks, validation, and applying Terraform plans.
- Enhanced the adopt_event_source_mappings.sh script to support verification of event source mappings and improved logging for existing mappings.
- Updated README.md to document the new migration process and rollback procedures for event source mappings.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request infrastructure Pull requests that update terraform code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants