Refactor CloudFormation templates for StackSets deployment#2
Refactor CloudFormation templates for StackSets deployment#2
Conversation
- Updated README.md to reflect new structure and deployment order for CloudFormation StackSets. - Introduced new directory structure for PLT and WKL OU deployments. - Added detailed deployment order flowchart and StackSet configuration examples. - Created CloudFormation templates for GitHub OIDC provider and CI/CD runner role in PLT OU. - Implemented CI/CD deploy role and Terraform state backend in WKL OU with automatic onboarding. - Enhanced authentication documentation to clarify cross-account role chaining. - Revised conventions to align with new OU hierarchy and environment definitions. - Added comprehensive IAM policies for deploy roles, including KMS encryption and DynamoDB state locking.
WalkthroughIntroduces an OU-based multi-account AWS deployment model with cross-account role chaining for CI/CD, CloudFormation StackSet templates for PLT and WKL tiers, shared artifact and caching infrastructure, and comprehensive caching strategies across GitHub Actions, GitLab CI, and Jenkins platforms. Updates documentation to reflect organizational unit hierarchy and naming conventions. Changes
Sequence DiagramsequenceDiagram
actor GitHub as GitHub Actions<br/>Workflow
participant OIDC as GitHub OIDC<br/>Provider
participant PLT as PLT Account<br/>Runner Role
participant WKL as WKL Account<br/>Deploy Role
participant Infra as Infrastructure<br/>Deployment
GitHub->>OIDC: Request OIDC Token<br/>(GitHub Org + Repo)
OIDC-->>GitHub: OIDC Token
GitHub->>PLT: AssumeRoleWithWebIdentity<br/>(OIDC Token)
PLT-->>GitHub: Temporary Credentials<br/>(PLT Runner Role)
GitHub->>WKL: AssumeRole<br/>(cicd-deploy-role via PLT creds)
WKL-->>GitHub: Temporary Credentials<br/>(WKL Deploy Role)
GitHub->>Infra: Deploy Infrastructure<br/>(CloudFormation, Terraform, etc.)
Infra-->>GitHub: Deployment Complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (15)
docs/pipeline-rules.md (1)
170-180: Minor inconsistency in GitLab cache path.The GitLab cache example uses
.terraform/providers/(line 178), while other documentation (e.g.,gitlab-ci/README.mdline 83) uses.terraform-cache/. Consider aligning paths for consistency across docs.🔎 Suggested alignment
cache: key: files: - .terraform.lock.hcl paths: - - .terraform/providers/ + - .terraform-cache/ policy: pull-pushdocs/conventions.md (2)
14-25: Consider adding a language specifier to the fenced code block.While ASCII diagrams typically don't require syntax highlighting, adding
textas the language specifier satisfies the linter and explicitly indicates the content type.Proposed fix
-``` +```text Root ├── PLT OU │ └── PLT-Runner Account
43-46: Consider adding a language specifier for consistency.Similar to the OU hierarchy diagram, adding
textsatisfies the linter.Proposed fix
-``` +```text dev, prod, qat, sbx, stg → Wrong (prod before qat) 05-sbx, 10-dev, 20-qat, 40-stg, 70-prod, 90-dr → Correctdocs/caching.md (2)
22-32: Consider adding a language specifier for the S3 structure diagram.Proposed fix
-``` +```text s3://${ORG}-cicd-artifacts-${PLT_ACCOUNT_ID}-${REGION}/ ├── terraform-providers/{hash}/ # 7 day TTL
456-468: Avoid storing credentials viaaws configurein CI pipelines.Using
aws configure setwrites credentials to~/.aws/credentialsin plaintext, which can persist beyond the job. For GitHub Actions, prefer environment variables scoped to the step.Proposed fix using environment variables
# GitHub Actions with R2 - name: Configure R2 - run: | - aws configure set aws_access_key_id ${{ secrets.CF_R2_ACCESS_KEY_ID }} - aws configure set aws_secret_access_key ${{ secrets.CF_R2_SECRET_ACCESS_KEY }} - aws configure set region auto + env: + AWS_ACCESS_KEY_ID: ${{ secrets.CF_R2_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.CF_R2_SECRET_ACCESS_KEY }} + AWS_DEFAULT_REGION: auto - name: Download cache from R2 + env: + AWS_ACCESS_KEY_ID: ${{ secrets.CF_R2_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.CF_R2_SECRET_ACCESS_KEY }} run: | aws s3 cp s3://cicd-artifacts/terraform-cache.tar.gz /tmp/ \ --endpoint-url https://${{ secrets.CF_ACCOUNT_ID }}.r2.cloudflarestorage.com || truecloudformation/stacksets/plt/00-oidc-provider-github.yaml (1)
9-13:GitHubOrganizationparameter is defined but not used in the resource.The parameter is exported in Outputs but never referenced in the
GitHubOIDCProviderresource properties or tags. Consider adding it to tags for traceability, or if it's only needed for downstream stacks, document this intention.Proposed fix: Add to resource tags
Tags: - Key: Name Value: github-actions-oidc - Key: Purpose Value: cicd-authentication + - Key: GitHubOrg + Value: !Ref GitHubOrganization - Key: OU Value: PLTcloudformation/README.md (2)
7-17: Add language specifier for directory structure.Proposed fix
-``` +```text cloudformation/stacksets/ ├── plt/ # Deploy to PLT OU
120-127: Add language specifier for the role trust diagram.Proposed fix
-``` +```text PLT Account WKL Accounts ─────────── ────────────cloudformation/stacksets/plt/25-cicd-artifacts.yaml (1)
174-180: Thearchives/prefix lifecycle rule may be unused.The
IntelligentTieringrule transitions objects underarchives/prefix, but no other rule or documentation shows objects being written to this prefix. Consider removing this rule or documenting whenarchives/should be used.cloudformation/stacksets/wkl/15-iam-deploy-role.yaml (1)
85-119:VPCFullstatement is redundant.The
EC2Fullstatement (line 86-89) already grantsec2:*on all resources, making theVPCFullstatement (lines 91-119) unnecessary. The VPC actions are a subset of EC2 actions.Proposed fix: Remove redundant VPCFull statement
# Full EC2 access - Sid: EC2Full Effect: Allow Action: ec2:* Resource: '*' - # Full VPC access - - Sid: VPCFull - Effect: Allow - Action: - - ec2:CreateVpc - - ec2:DeleteVpc - ... (remove entire VPCFull block)If you intend to later restrict EC2Full while keeping VPC permissions, consider documenting this intent.
docs/authentication.md (1)
82-88: Add language identifier to fenced code block.The static analysis tool flagged this code block as missing a language specification. Since this is a text-based flow diagram, consider adding
textorplaintextas the language identifier.🔎 Proposed fix
-``` +```text Deploy to WKL-NPD (sbx, dev, qat): GitHub → OIDC → PLT Runner Role → WKL-NPD Deploy Role → DeployREADME.md (3)
5-22: Add language identifier to OU hierarchy code block.The tree diagram should have a language identifier for proper rendering and to satisfy linting rules.
🔎 Proposed fix
-``` +```text Root ├── PLT OU (Platform)
68-76: Add language identifier to environment progression code block.The environment progression diagram should have a language identifier.
🔎 Proposed fix
-``` +```text WKL-NPD OU WKL-PRD OU
82-93: Add language identifier to directory structure code block.The directory structure should have a language identifier for consistency.
🔎 Proposed fix
-``` +```text cloudformation/stacksets/cloudformation/stacksets/wkl/20-terraform-state-backend.yaml (1)
364-375: BackendConfig output requires manual key path substitution.The
BackendConfigoutput provides a useful Terraform backend snippet, but the<ENV>placeholder on line 370 needs manual substitution. Consider documenting this in the output description or providing multiple outputs for common environment paths.🔎 Proposed enhancement to description
BackendConfig: - Description: Terraform backend configuration snippet + Description: Terraform backend configuration snippet (replace <ENV> with environment name, e.g., sbx, dev, prd) Value: !Sub |
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
.github/workflows/README.mdREADME.mdcloudformation/README.mdcloudformation/stacksets/plt/00-oidc-provider-github.yamlcloudformation/stacksets/plt/10-iam-runner-role.yamlcloudformation/stacksets/plt/25-cicd-artifacts.yamlcloudformation/stacksets/wkl/15-iam-deploy-role.yamlcloudformation/stacksets/wkl/20-terraform-state-backend.yamldocs/authentication.mddocs/caching.mddocs/conventions.mddocs/pipeline-rules.mdgitlab-ci/README.mdjenkins/README.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/authentication.md
82-82: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
README.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
70-70: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
84-84: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/caching.md
22-22: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/conventions.md
14-14: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
43-43: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
cloudformation/README.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
120-120: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (33)
cloudformation/stacksets/plt/10-iam-runner-role.yaml (2)
75-79: Verify wildcard account scope is intentional.The resource
arn:aws:iam::*:role/cicd-deploy-roleallows assuming this role in any AWS account, not just WKL accounts. Security relies entirely on the target account's trust policy.This is acceptable if:
- The
cicd-deploy-rolein WKL accounts explicitly trusts only this PLT runner role- You don't control accounts where an attacker could create a
cicd-deploy-roletrusting the PLT accountConsider restricting to known account IDs or an AWS Organization condition if tighter controls are needed.
45-91: LGTM on role structure and tagging.The role configuration is well-structured with appropriate session durations, descriptive tags, and clear separation between OIDC and EC2-based authentication paths. The exported ARNs enable proper cross-stack referencing.
jenkins/README.md (1)
126-231: LGTM on caching documentation.The caching strategy section is comprehensive, covering workspace persistence, S3 backend caching, and shared library patterns. The examples correctly handle cache misses with
|| trueand demonstrate content-based cache keys using lockfile hashes.gitlab-ci/README.md (1)
65-135: LGTM on GitLab CI caching documentation.The examples correctly leverage GitLab's native caching with content-based keys (
key.files) and appropriatepull-pushpolicy. The S3 backend fallback provides a solid cross-runner caching option..github/workflows/README.md (1)
41-103: LGTM on GitHub Actions caching documentation.The examples properly use
actions/cache@v4with content-based keys andrestore-keysfallback. Usingif: always()on S3 upload ensures the cache persists regardless of job outcome.docs/pipeline-rules.md (1)
71-145: Excellent caching strategy documentation.The section provides clear rationale with concrete timing comparisons, sensible TTL policies aligned with security considerations, and practical cache invalidation patterns. The scheduled weekly refresh and manual override options are good practices.
docs/conventions.md (1)
5-37: LGTM - Well-structured OU hierarchy documentation.The OU hierarchy table, ASCII diagram, and environments mapping provide clear guidance for the multi-account deployment model. The numbered prefix scheme maintains logical ordering across CI/CD systems.
docs/caching.md (2)
102-145: Well-structured Terraform cache script with proper error handling.The script uses
set -euo pipefail, handles missingARTIFACT_BUCKETgracefully, and uses content-based cache keys. The upload/download pattern is clean and reusable.
63-76: Excellent security-conscious cache expiration policy.The documented TTLs appropriately balance performance with security—shorter TTLs for frequently patched dependencies (npm, pip) and longer for more stable artifacts (Terraform providers, Go modules). The 1-day TTL for container layers correctly prioritizes security for base images.
cloudformation/stacksets/plt/00-oidc-provider-github.yaml (2)
32-50: OIDC provider configuration looks correct.The URL, ClientIdList (
sts.amazonaws.com), and tagging strategy align with AWS best practices for GitHub Actions OIDC integration.
15-18: Thumbprints are no longer required for GitHub OIDC with AWS.As of July 2023, AWS secures communication by trusting GitHub Actions's root certificate authorities instead of using a certificate thumbprint to verify GitHub's IdP server certificate. The thumbprints remain in the configuration but are no longer used for validation. Providing a thumbprint is optional when creating the IAM OIDC identity provider. These thumbprints can be safely removed or left as defaults—no action is needed for GitHub customers.
cloudformation/README.md (2)
109-116: Excellent automatic onboarding documentation.Clear explanation of how
--auto-deployment Enabled=trueenables zero-touch onboarding for new accounts. This pattern significantly reduces operational overhead when scaling the organization.
19-27: Well-structured deployment order visualization.The Mermaid flowchart clearly shows the dependency chain between StackSets. This helps operators understand the required deployment sequence.
cloudformation/stacksets/plt/25-cicd-artifacts.yaml (3)
83-84: VerifyDeletionPolicy: Deleteis intentional for the artifacts bucket.While acceptable for ephemeral cache data, this means the bucket and all contents are permanently deleted if the stack is deleted. If build artifacts need retention, consider
Retainor add a lifecycle rule that transitions critical artifacts before deletion.
106-172: Lifecycle rules correctly implement the documented cache expiration policy.The TTL configuration aligns with the security-conscious defaults from
docs/caching.md:
- 7 days for Terraform providers/plugins and Go modules
- 3 days for dependency caches (pip, npm, Ansible)
- 1 day for container layers and plan artifacts
224-252: Bucket policy correctly limits access to runner roles.The policy enforces HTTPS-only access via the
RequireHTTPSstatement and grants permissions only tocicd-runner-roleandcicd-runner-ec2-role. This follows least-privilege principles with specific S3 actions (GetObject, PutObject, DeleteObject, ListBucket, GetBucketLocation). The referenced roles are properly defined in10-iam-runner-role.yamland should be created before deploying this stack.cloudformation/stacksets/wkl/15-iam-deploy-role.yaml (3)
120-138: Consider tightening IAM permissions or adding conditions.
iam:*onResource: '*'is very broad. While theDenyModifyFoundationguard protectscicd-*roles tagged withFoundation=true, the deploy role can still:
- Create new admin roles without the Foundation tag
- Modify IAM policies, users, and groups without restriction
If acceptable for your use case, consider documenting the rationale. Otherwise, scope down to specific prefixes or add conditions.
46-60: Trust policy correctly implements cross-account role chaining.The
AssumeRolePolicyDocumentproperly trusts bothcicd-runner-roleandcicd-runner-ec2-rolefrom the PLT account, enabling the documented role chaining flow: GitHub Actions → OIDC → PLT runner → WKL deploy.
126-138: Good use of Foundation tag for guardrail protection.The
DenyModifyFoundationstatement prevents the deploy role from modifying its own trust policy or detaching policies from foundation resources. This is an important security control.docs/authentication.md (4)
3-13: LGTM! Clear OU model documentation.The OU model table and cross-account role chaining pattern are well-documented, providing clear context for the PLT/WKL hierarchy.
17-23: LGTM! Mermaid diagram clearly illustrates the flow.The flowchart effectively visualizes the cross-account role chaining pattern from CI/CD platform through OIDC to deployment.
90-117: LGTM! Trust policy examples are well-structured.The trust policy examples clearly demonstrate:
- OIDC trust with
StringLikecondition for repo-based restrictions- Cross-account trust from WKL deploy role to PLT runner role
Note: The placeholder
PLT_ACCOUNT_IDon line 115 is intentional as a template example.
119-125: LGTM! Security section covers key points.The security recommendations appropriately emphasize OIDC usage, role separation, and short session durations.
README.md (4)
33-46: LGTM! Cross-account role chaining pattern is clearly documented.The mermaid diagram and explanation effectively communicate the runner-to-deploy role chaining architecture.
48-55: LGTM! Two-StackSet deployment model is well-explained.The table clearly shows the separation between PLT and WKL StackSets and the automatic onboarding benefit.
95-106: LGTM! Pipeline generation guidance is practical.The prompt template and environment breakdown across OUs provides clear context for generating platform-specific pipelines.
116-121: LGTM! Documentation links are comprehensive.The docs section provides clear references to all related documentation.
cloudformation/stacksets/wkl/20-terraform-state-backend.yaml (6)
1-14: LGTM! Template description is comprehensive.The description clearly outlines what resources are created and the automatic onboarding behavior for new WKL accounts.
45-50: TTL parameter note:LockTableTTLHoursis stored as a tag only.The
LockTableTTLHoursparameter enables DynamoDB TTL on theExpirationTimeattribute (line 300-302), but the actualExpirationTimevalue must be set by Terraform when acquiring locks. The tag on line 319-320 is for documentation purposes. This is correct behavior—just ensure Terraform is configured to populateExpirationTimewhen writing lock entries.
80-115: LGTM! KMS key policy follows least-privilege principles.The key policy correctly:
- Grants root account full access for key administration
- Restricts CI/CD deploy role to only encryption/decryption operations
- Enables automatic key rotation
127-185: LGTM! S3 bucket configuration follows security best practices.Excellent implementation with:
DeletionPolicy: Retainto prevent accidental state loss- Versioning enabled for state history
- All public access blocked
BucketOwnerEnforcedownership- Lifecycle rules for version cleanup and intelligent tiering
- Access logging enabled
187-218: LGTM! Bucket policy enforces security controls.The policy correctly:
- Denies non-HTTPS requests
- Allows only the local deploy role to access state objects
287-324: LGTM! DynamoDB table is well-configured for state locking.Good implementation with:
- PAY_PER_REQUEST billing for variable usage patterns
- TTL enabled for stale lock cleanup
- Point-in-time recovery for disaster recovery
- KMS encryption (customer-managed or AWS-managed based on condition)
| AllowedBranches: | ||
| Type: String | ||
| Description: | | ||
| Branch pattern for deployments. | ||
| Examples: main, release-*, refs/heads/main | ||
| Default: '*' |
There was a problem hiding this comment.
AllowedBranches parameter is defined but never used in the trust policy.
The parameter suggests branch-level restrictions, but the OIDC condition on line 69 only filters by repository (repo:${GitHubOrganization}/${AllowedRepositories}:*). The trailing :* matches any ref, ignoring the AllowedBranches value entirely.
This creates a false sense of security—users may believe deployments are branch-restricted when they're not.
🔎 Proposed fix to incorporate branch restriction
StringLike:
- 'token.actions.githubusercontent.com:sub': !Sub 'repo:${GitHubOrganization}/${AllowedRepositories}:*'
+ 'token.actions.githubusercontent.com:sub': !Sub 'repo:${GitHubOrganization}/${AllowedRepositories}:ref:refs/heads/${AllowedBranches}'Note: If you need to support both branch and tag patterns, consider using multiple conditions or documenting that AllowedBranches accepts refs/heads/* style patterns.
Also applies to: 65-69
🤖 Prompt for AI Agents
In cloudformation/stacksets/plt/10-iam-runner-role.yaml around lines 28-33 and
65-69, the AllowedBranches parameter is declared but not used in the OIDC trust
policy (the repo condition ends with :*), so update the OIDC condition to
include AllowedBranches instead of the trailing wildcard; replace the repo
pattern suffix from :* to :${AllowedBranches} (or construct the repo ARN using
Fn::Sub/Fn::Join with ${GitHubOrganization}, ${AllowedRepositories}, and
${AllowedBranches}) so branch-level restrictions are enforced, and if you must
support both branch and tag patterns add an additional condition entry or
document accepted patterns (e.g., refs/heads/* vs refs/tags/*).
| RunnerAccountId: | ||
| Type: String | ||
| Description: AWS Account ID of the PLT runner account (for cross-account state access) | ||
| AllowedPattern: '^\d{12}$' | ||
| ConstraintDescription: Must be a 12-digit AWS account ID |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, check if the file exists and read its full content
if [ -f "cloudformation/stacksets/wkl/20-terraform-state-backend.yaml" ]; then
echo "=== File Content ==="
cat -n "cloudformation/stacksets/wkl/20-terraform-state-backend.yaml"
echo ""
echo "=== Searching for RunnerAccountId references ==="
grep -n "RunnerAccountId" "cloudformation/stacksets/wkl/20-terraform-state-backend.yaml"
else
echo "File not found at cloudformation/stacksets/wkl/20-terraform-state-backend.yaml"
# Try to find the file
echo "Searching for the template file..."
find . -name "*20-terraform-state-backend*" -type f 2>/dev/null
fiRepository: zsoftly/iac-cicd-reference
Length of output: 14401
Remove unused RunnerAccountId parameter or add it to bucket/KMS policies
The RunnerAccountId parameter is defined but not referenced anywhere in the template. Either remove it if cross-account access is not needed, or add it to the S3 bucket policy and KMS key policy to enable cross-account access from the PLT runner account.
🤖 Prompt for AI Agents
In cloudformation/stacksets/wkl/20-terraform-state-backend.yaml around lines
24-28, the RunnerAccountId parameter is declared but not used; either remove the
parameter if no cross-account access is needed, or wire it into the S3 bucket
and KMS key policies to grant the PLT runner account access. To fix: if
removing, delete the RunnerAccountId parameter block and any unused references;
if enabling cross-account access, update the S3 bucket policy to allow the
principal arn:aws:iam::${RunnerAccountId}:root (or specific role/ARN) to
GetObject/PutObject and required actions and reference the parameter via !Ref
(or Fn::Sub), and update the KMS KeyPolicy to add statements permitting the same
account (or role) to use/describe/decrypt/encrypt the key, again referencing the
parameter; ensure policy Condition/Principal ARNs are correctly formed and add
ConstraintDescription or documentation if you keep the parameter.
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.