Skip to content

fix(deploy): skip CDK diff for new stacks in TUI deploy#1384

Open
jariy17 wants to merge 2 commits into
mainfrom
fix/tui-deploy-new-stack-diff-race
Open

fix(deploy): skip CDK diff for new stacks in TUI deploy#1384
jariy17 wants to merge 2 commits into
mainfrom
fix/tui-deploy-new-stack-diff-race

Conversation

@jariy17
Copy link
Copy Markdown
Contributor

@jariy17 jariy17 commented May 26, 2026

Summary

  • Fix TUI deploy race condition for first-time deployments of projects without file assets (e.g. dataset-only projects, evaluator-only projects)
  • CDK's diff() creates a temporary changeset against a non-existent stack, which briefly materializes a ghost stack. For projects with agents, the subsequent asset upload provides enough delay for the ghost to disappear. For asset-free projects, deploy() immediately races against the ghost stack deletion and fails with "No template found for Stack"
  • Fix: skip diff() entirely when the stack doesn't exist yet — there's nothing meaningful to compare against

Test plan

  • Typecheck passes (tsc --noEmit)
  • Lint + prettier + secretlint pass (pre-commit hooks)
  • Dataset-only project deploys successfully via TUI (previously failed)
  • Project with agent still deploys successfully via TUI (no regression)

CDK's diff() creates a temporary changeset against a non-existent stack,
which briefly materializes a ghost stack in REVIEW_IN_PROGRESS state.
For projects without file assets (e.g. dataset-only), the subsequent
deploy() call races against the ghost stack deletion and fails with
"No template found for Stack".

Skip diff entirely for new stacks since there's nothing to compare
against — everything would show as an addition anyway.
@jariy17 jariy17 requested a review from a team May 26, 2026 20:51
@github-actions github-actions Bot added the size/s PR size: S label May 26, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 26, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label May 26, 2026
@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.15.0.tgz

How to install

gh release download pr-1384-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.15.0.tgz

Copy link
Copy Markdown

@agentcore-cli-automation agentcore-cli-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking down this race condition — the analysis in the description is convincing.

I have one concern about error handling that should be addressed before merging. Details inline.


const target = context?.awsTargets[0];
const currentStackName = stackNames?.[0];
const stackStatus = target && currentStackName ? await checkStackStatus(target.region, currentStackName) : null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkStackStatus only swallows ValidationError (stack-doesn't-exist) — every other error is rethrown (transient network failure, throttling, expired/invalid creds, IAM denial, etc.). Because this await sits outside any try/catch, a throw here propagates straight out of run() and skips:

  • isDiffRunningRef.current = false
  • setIsDiffLoading(false)
  • logger.endStep(...)
  • setPreDeployDiffStep(... status: 'success')
  • the rest of the deploy flow (publish assets / deploy / dispose)

So a transient DescribeStacks failure leaves the UI permanently stuck with the "Computing diff changes..." spinner, and the deploy step itself never errors out or progresses. That's a regression from the previous code, which had the diff cleanup in a finally so any diff error was non-fatal.

A couple of options:

  1. Wrap just the checkStackStatus call in a try/catch and treat a failure as "unknown" — i.e. fall through to the existing diff path (current behavior). Simplest and preserves the bug-fix intent (worst case, you get the old race back, which you already tolerated before this PR).
  2. Restructure so the entire diff phase (status check + diff) is inside a try/finally that always cleans up isDiffRunningRef, isDiffLoading, and preDeployDiffStep, mirroring the original code's resilience.

Option 2 is probably the safer long-term fix since it also re-establishes the invariant that the diff step never blocks the deploy flow.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 43.63% 10526 / 24121
🔵 Statements 42.87% 11188 / 26094
🔵 Functions 40.53% 1806 / 4455
🔵 Branches 39.79% 6728 / 16906
Generated in workflow #3303 for commit 93bc372 by the Vitest Coverage Report Action

@github-actions github-actions Bot removed the size/s PR size: S label May 26, 2026
@github-actions github-actions Bot added the size/s PR size: S label May 26, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 26, 2026
@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 26, 2026
Wrap the entire diff phase (checkStackStatus + diff) in a try/finally
so that transient failures (network, throttling, expired creds) in the
stack status check never leave the UI stuck on the diff spinner. If the
status check fails, fall through to the existing diff path (worst case:
the old race condition, which was already tolerated before this fix).
@jariy17 jariy17 force-pushed the fix/tui-deploy-new-stack-diff-race branch from 0228c53 to 93bc372 Compare May 26, 2026 22:02
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S labels May 26, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label May 26, 2026
@agentcore-devx-automation
Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants