feat: implement tfe plannable agent#956
Conversation
📝 WalkthroughWalkthroughAdds a new Terraform Cloud speculative planning job agent (TFCPlanner), production clients for workspace setup and speculative runs, unit tests, and registers the planner in the job-agent registry. Planner manages a two-phase workflow (create run, then poll/complete) with persisted JSON state and timeout handling. Changes
Sequence DiagramsequenceDiagram
participant Caller as Caller
participant Planner as TFCPlanner
participant Setup as WorkspaceSetup
participant Runner as SpeculativeRunner
participant TFC as TerraformCloud API
Caller->>Planner: Plan(ctx, dispatchCtx, state=nil)
Planner->>Setup: Setup(ctx, dispatchCtx)
Setup->>TFC: Upsert workspace & sync vars
TFC-->>Setup: workspaceID
Setup-->>Planner: workspaceID
Planner->>Runner: CreateSpeculativeRun(ctx, cfg, workspaceID)
Runner->>TFC: Create run (plan-only)
TFC-->>Runner: runID
Runner-->>Planner: runID
Planner-->>Caller: PlanResult (in-progress, persisted runId)
Caller->>Planner: Plan(ctx, dispatchCtx, state=contains runId)
Planner->>Runner: ReadRunStatus(ctx, cfg, runId)
Runner->>TFC: Get run details
TFC-->>Runner: RunStatus (status, planID, counts)
Runner-->>Planner: RunStatus
alt Run not finished and not timed out
Planner-->>Caller: PlanResult (in-progress, increment pollCount)
else Run finished successfully
Planner->>Runner: ReadPlanJSON(ctx, cfg, planID)
Runner->>TFC: Fetch plan JSON
TFC-->>Runner: planJSON
Runner-->>Planner: planJSON
Planner-->>Caller: PlanResult (completed, Proposed, ContentHash, HasChanges)
else Run errored or timed out
Planner-->>Caller: PlanResult (completed, error/timeout message)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a Terraform Cloud/Enterprise (“tfe”) Plannable job agent so the workspace-engine can compute deployment plan results by creating a speculative TFC run, polling until completion, and storing the resulting plan JSON.
Changes:
- Register a new Terraform Cloud planner in the deployment plan result controller registry.
- Implement
TFCPlanner(types.Plannable) to create/poll speculative runs and return aPlanResult. - Add production TFC client implementations (
GoWorkspaceSetup,GoSpeculativeRunner) and unit tests for planner behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| apps/workspace-engine/svc/controllers/deploymentplanresult/getters_postgres.go | Registers the Terraform Cloud planner in the plan-result controller’s agent registry. |
| apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan.go | Implements the Plannable planner logic (create run, poll, complete with plan JSON/hash). |
| apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go | Adds production implementations for workspace setup and speculative run operations via go-tfe. |
| apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go | Adds unit tests covering create, poll, completion, timeout, and error scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| now := time.Now() | ||
| return &types.PlanResult{ | ||
| CompletedAt: &now, | ||
| Message: fmt.Sprintf("Run %s ended with status: %s", s.RunID, status.Status), | ||
| }, nil |
There was a problem hiding this comment.
When the TFC run ends in an errored/canceled/discarded state, this branch returns a non-nil CompletedAt with a nil error. The deploymentplanresult controller treats any nil error + non-nil CompletedAt as a successful completion and will persist Status=completed, which misclassifies failed plans. Return a non-nil error here (including run ID/status) so the controller records Status=errored (or introduce an explicit failure status in PlanResult and handle it in the controller).
| now := time.Now() | |
| return &types.PlanResult{ | |
| CompletedAt: &now, | |
| Message: fmt.Sprintf("Run %s ended with status: %s", s.RunID, status.Status), | |
| }, nil | |
| return nil, fmt.Errorf("run %s ended with status: %s", s.RunID, status.Status) |
| now := time.Now() | ||
| return &types.PlanResult{ | ||
| CompletedAt: &now, | ||
| Message: fmt.Sprintf( | ||
| "Run %s timed out after %d polls (%s elapsed), last status: %s", | ||
| s.RunID, s.PollCount, time.Since(*s.FirstPolled).Round(time.Second), status.Status, | ||
| ), | ||
| }, nil |
There was a problem hiding this comment.
The timeout path also returns CompletedAt with a nil error, so the controller will persist Status=completed even though this is effectively a plan failure/timeout. Consider returning an error (e.g., include elapsed time/poll count/last status) so timeouts are recorded as Status=errored and are visible as failures to callers.
| now := time.Now() | |
| return &types.PlanResult{ | |
| CompletedAt: &now, | |
| Message: fmt.Sprintf( | |
| "Run %s timed out after %d polls (%s elapsed), last status: %s", | |
| s.RunID, s.PollCount, time.Since(*s.FirstPolled).Round(time.Second), status.Status, | |
| ), | |
| }, nil | |
| elapsed := time.Since(*s.FirstPolled).Round(time.Second) | |
| return nil, fmt.Errorf( | |
| "run %s timed out after %d polls (%s elapsed), last status: %s", | |
| s.RunID, s.PollCount, elapsed, status.Status, | |
| ) |
| func (p *TFCPlanner) completePlan( | ||
| ctx context.Context, | ||
| cfg *tfeConfig, | ||
| status *RunStatus, | ||
| ) (*types.PlanResult, error) { | ||
| planJSON, err := p.runner.ReadPlanJSON(ctx, cfg, status.PlanID) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("read plan JSON: %w", err) | ||
| } |
There was a problem hiding this comment.
completePlan assumes status.PlanID is populated whenever status.IsFinished is true. In ReadRunStatus, PlanID is only set when run.Plan != nil, so it's possible to reach here with an empty PlanID (e.g., unexpected API response), which will produce a confusing downstream error. Add an explicit check for empty PlanID and return a clear error before calling ReadPlanJSON.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go`:
- Around line 14-17: Add Go doc comments for the exported methods: Setup,
CreateSpeculativeRun, ReadRunStatus, and ReadPlanJSON; for each method (e.g.,
GoWorkspaceSetup.Setup, GoWorkspaceSetup.CreateSpeculativeRun,
GoWorkspaceSetup.ReadRunStatus, GoWorkspaceSetup.ReadPlanJSON) add a concise
comment immediately above the method declaration describing what the method
does, its important parameters and return values, and any non-obvious behavior
or side effects (keep each comment one or two sentences and follow Go doc
comment style starting with the method name).
- Around line 99-104: The switch on run.Status currently treats
tfe.RunPlannedAndFinished as finished and
tfe.RunErrored/RunCanceled/RunDiscarded as errored but omits
tfe.RunPolicySoftFailed; update the branch handling in the switch that sets
status.IsFinished and status.IsErrored (the block inspecting run.Status and
assigning status.IsFinished / status.IsErrored) to include
tfe.RunPolicySoftFailed in the errored-terminal case so that runs with a
Sentinel soft-fail are marked terminal immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 06eacf06-4bb1-41dd-ac73-cd0e2b22afcf
📒 Files selected for processing (3)
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan.goapps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.goapps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go
✅ Files skipped from review due to trivial changes (1)
- apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go`:
- Around line 102-107: The switch handling run.Status in tfe_plan_client.go
currently marks only RunPlannedAndFinished as terminal and treats RunErrored,
RunCanceled, RunDiscarded, and RunPolicySoftFailed as errored; add
tfe.RunForceCanceled (Terraform Cloud's force_canceled status) to the error
branch so that when run.Status == tfe.RunForceCanceled you set status.IsErrored
= true (same branch as RunErrored/RunCanceled/RunDiscarded/RunPolicySoftFailed),
ensuring polling stops immediately for force-canceled runs; update any related
comments/tests referencing terminal statuses if present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 66136e89-0530-4234-80c2-a2104d60d727
📒 Files selected for processing (2)
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.goapps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go
| switch run.Status { | ||
| case tfe.RunPlannedAndFinished: | ||
| status.IsFinished = true | ||
| case tfe.RunErrored, tfe.RunCanceled, tfe.RunDiscarded, tfe.RunPolicySoftFailed: | ||
| status.IsErrored = true | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Terraform Cloud Runs API, is "force_canceled" a possible run status, and should it be treated as a terminal non-success state for speculative (plan-only) runs?
💡 Result:
Yes. The Terraform Cloud (HCP Terraform) Runs API documents force_canceled as a valid data.attributes.status value (“A workspace admin forcefully canceled the run.”). It indicates the run was terminated due to an administrative force-cancel, not a successful completion. [1]
For speculative (plan-only) runs, the documented successful terminal status is planned_and_finished (“The run is completed… This is a final state.”). A force_canceled run should therefore be treated as a terminal, non-success outcome when evaluating plan-only run results. [1]
Source
[1] Terraform Cloud Runs API docs (run statuses; plan-only terminal status; cancel/force-cancel behavior). (developer.hashicorp.com)
Citations:
🏁 Script executed:
cat -n apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go | sed -n '95,115p'Repository: ctrlplanedev/ctrlplane
Length of output: 798
🏁 Script executed:
rg "force_canceled|RunStatus" apps/workspace-engine/pkg/jobagents/terraformcloud/ -A 2 -B 2Repository: ctrlplanedev/ctrlplane
Length of output: 8901
🏁 Script executed:
rg "import.*tfe\|from.*tfe" apps/workspace-engine/pkg/jobagents/terraformcloud/ -A 3Repository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
head -30 apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.goRepository: ctrlplanedev/ctrlplane
Length of output: 784
🏁 Script executed:
rg "tfe\." apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go | head -20Repository: ctrlplanedev/ctrlplane
Length of output: 285
🏁 Script executed:
rg "RunCanceled|RunErrored|RunPlannedAndFinished|RunDiscarded|RunPolicySoftFailed" apps/workspace-engine/pkg/jobagents/terraformcloud/ -B 2 -A 2Repository: ctrlplanedev/ctrlplane
Length of output: 1738
🏁 Script executed:
rg "RunForceCanceled|force_canceled" apps/workspace-engine/ -iRepository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
web_search
hashicorp go-tfe library RunForceCanceled constant or force_canceled statusRepository: ctrlplanedev/ctrlplane
Length of output: 164
🌐 Web query:
hashicorp go-tfe RunForceCanceled constant API documentation
💡 Result:
force_canceled is a valid run state in the HCP Terraform / Terraform Enterprise API (data.attributes.status) and means “a workspace admin forcefully canceled the run.” (developer.hashicorp.com)
However, in the current github.com/hashicorp/go-tfe package, there is no RunForceCanceled (or RunForceCanceled-like) constant in the RunStatus constants list; the exported RunStatus constants include RunCanceled, RunErrored, etc., but not force_canceled. (pkg.go.dev)
What to do in Go
- Compare against the raw string:
status == "force_canceled", or - Define your own constant in your codebase:
const RunForceCanceled tfe.RunStatus = "force_canceled"
References (numbered)
- [1] HCP Terraform Runs API docs (“Run States” includes
force_canceled). (developer.hashicorp.com) - [2]
go-tfeRunStatusconstants list (does not includeforce_canceled). (pkg.go.dev)
Citations:
- 1: https://developer.hashicorp.com/terraform/cloud-docs/api-docs/run
- 2: https://pkg.go.dev/github.com/hashicorp/go-tfe
- 3: https://developer.hashicorp.com/terraform/cloud-docs/api-docs/run
- 4: https://pkg.go.dev/github.com/hashicorp/go-tfe
🏁 Script executed:
rg "const.*Run" apps/workspace-engine/pkg/jobagents/terraformcloud/ -B 2 -A 2Repository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
rg "type RunStatus\|Status.*tfe\.RunStatus" apps/workspace-engine/pkg/jobagents/terraformcloud/ -A 3Repository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
rg "\.Status\s*=" apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go -B 3 -A 1Repository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
python3 << 'EOF'
# Check the go-tfe package to understand RunStatus type
import subprocess
result = subprocess.run(['rg', 'github.com/hashicorp/go-tfe', 'go.mod', 'go.sum'],
capture_output=True, text=True, cwd='/tmp')
if result.returncode == 0:
print(result.stdout)
else:
print("Not found in go files")
EOFRepository: ctrlplanedev/ctrlplane
Length of output: 87
🏁 Script executed:
cat -n apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go | sed -n '80,110p'Repository: ctrlplanedev/ctrlplane
Length of output: 1039
🏁 Script executed:
rg "run\.Status" apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go -B 5 -A 5Repository: ctrlplanedev/ctrlplane
Length of output: 607
Add force_canceled to terminal error statuses in the run status switch.
The switch at line 102 does not handle Terraform Cloud's force_canceled run status. When a run is administratively force-canceled, neither IsFinished nor IsErrored will be set, causing polling to continue until timeout instead of terminating immediately as an error state.
Proposed fix
switch run.Status {
case tfe.RunPlannedAndFinished:
status.IsFinished = true
-case tfe.RunErrored, tfe.RunCanceled, tfe.RunDiscarded, tfe.RunPolicySoftFailed:
+case tfe.RunErrored, tfe.RunCanceled, tfe.RunDiscarded, tfe.RunPolicySoftFailed, tfe.RunStatus("force_canceled"):
status.IsErrored = true
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go` around
lines 102 - 107, The switch handling run.Status in tfe_plan_client.go currently
marks only RunPlannedAndFinished as terminal and treats RunErrored, RunCanceled,
RunDiscarded, and RunPolicySoftFailed as errored; add tfe.RunForceCanceled
(Terraform Cloud's force_canceled status) to the error branch so that when
run.Status == tfe.RunForceCanceled you set status.IsErrored = true (same branch
as RunErrored/RunCanceled/RunDiscarded/RunPolicySoftFailed), ensuring polling
stops immediately for force-canceled runs; update any related comments/tests
referencing terminal statuses if present.
Resolves #957
Summary by CodeRabbit