Skip to content

feat: implement tfe plannable agent#956

Merged
adityachoudhari26 merged 3 commits intomainfrom
tfe-planner
Apr 9, 2026
Merged

feat: implement tfe plannable agent#956
adityachoudhari26 merged 3 commits intomainfrom
tfe-planner

Conversation

@adityachoudhari26
Copy link
Copy Markdown
Member

@adityachoudhari26 adityachoudhari26 commented Apr 9, 2026

Resolves #957

Summary by CodeRabbit

  • New Features
    • Terraform Cloud speculative planning: create and monitor speculative runs with automatic polling, timeout handling, deterministic content-hashable plan output, and resource change summaries (+/~/-) shown in results.
  • Tests
    • Added comprehensive tests covering planner identity, config errors, workspace setup/run creation failures, polling behavior, timeouts, errored/canceled runs, finished-plan parsing, and content-hash consistency.
  • Chores
    • Registered the new Terraform Cloud planner so it is available to the system.

Copilot AI review requested due to automatic review settings April 9, 2026 18:56
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

Adds a new Terraform Cloud speculative planning job agent (TFCPlanner), production clients for workspace setup and speculative runs, unit tests, and registers the planner in the job-agent registry. Planner manages a two-phase workflow (create run, then poll/complete) with persisted JSON state and timeout handling.

Changes

Cohort / File(s) Summary
TFC Planner Implementation
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan.go
New TFCPlanner implementing types.Plannable: two-phase Plan workflow, persisted JSON plan state, polling with timeout, plan parsing, SHA‑256 content hashing, and resource-change detection.
TFC Production Client
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go
New GoWorkspaceSetup and GoSpeculativeRunner providing workspace upsert, variable sync, speculative run creation, run status mapping, and plan JSON retrieval against Terraform Cloud.
TFC Tests
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go
Comprehensive unit tests with mocks covering config validation, error paths, initial run creation, polling increments, timeouts, errored/canceled flows, finished-plan parsing, deterministic hashing, and messaging.
Registry Registration
apps/workspace-engine/svc/controllers/deploymentplanresult/getters_postgres.go
Registers the new Terraform Cloud planner via terraformcloud.NewTFCPlanner(&GoWorkspaceSetup{}, &GoSpeculativeRunner{}) in the job-agent registry.

Sequence Diagram

sequenceDiagram
    participant Caller as Caller
    participant Planner as TFCPlanner
    participant Setup as WorkspaceSetup
    participant Runner as SpeculativeRunner
    participant TFC as TerraformCloud API

    Caller->>Planner: Plan(ctx, dispatchCtx, state=nil)
    Planner->>Setup: Setup(ctx, dispatchCtx)
    Setup->>TFC: Upsert workspace & sync vars
    TFC-->>Setup: workspaceID
    Setup-->>Planner: workspaceID
    Planner->>Runner: CreateSpeculativeRun(ctx, cfg, workspaceID)
    Runner->>TFC: Create run (plan-only)
    TFC-->>Runner: runID
    Runner-->>Planner: runID
    Planner-->>Caller: PlanResult (in-progress, persisted runId)

    Caller->>Planner: Plan(ctx, dispatchCtx, state=contains runId)
    Planner->>Runner: ReadRunStatus(ctx, cfg, runId)
    Runner->>TFC: Get run details
    TFC-->>Runner: RunStatus (status, planID, counts)
    Runner-->>Planner: RunStatus
    alt Run not finished and not timed out
        Planner-->>Caller: PlanResult (in-progress, increment pollCount)
    else Run finished successfully
        Planner->>Runner: ReadPlanJSON(ctx, cfg, planID)
        Runner->>TFC: Fetch plan JSON
        TFC-->>Runner: planJSON
        Runner-->>Planner: planJSON
        Planner-->>Caller: PlanResult (completed, Proposed, ContentHash, HasChanges)
    else Run errored or timed out
        Planner-->>Caller: PlanResult (completed, error/timeout message)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • jsbroks

Poem

🐰 I hopped through code and made a plan,
I spun up workspaces as fast as I can,
I polled and hashed each JSON view,
Counted adds and changes — one, two, two,
A carrot for each successful run! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: implementing a Terraform Cloud (TFE) planner agent that implements the Plannable interface.
Linked Issues check ✅ Passed The PR implements the Plannable interface for the TFE agent as required by #957, including the planner type, planning logic, and proper integration into the registry.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the TFE Plannable agent: core implementation, client integrations, comprehensive tests, and registry integration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tfe-planner

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Terraform Cloud/Enterprise (“tfe”) Plannable job agent so the workspace-engine can compute deployment plan results by creating a speculative TFC run, polling until completion, and storing the resulting plan JSON.

Changes:

  • Register a new Terraform Cloud planner in the deployment plan result controller registry.
  • Implement TFCPlanner (types.Plannable) to create/poll speculative runs and return a PlanResult.
  • Add production TFC client implementations (GoWorkspaceSetup, GoSpeculativeRunner) and unit tests for planner behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
apps/workspace-engine/svc/controllers/deploymentplanresult/getters_postgres.go Registers the Terraform Cloud planner in the plan-result controller’s agent registry.
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan.go Implements the Plannable planner logic (create run, poll, complete with plan JSON/hash).
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go Adds production implementations for workspace setup and speculative run operations via go-tfe.
apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go Adds unit tests covering create, poll, completion, timeout, and error scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +134 to +138
now := time.Now()
return &types.PlanResult{
CompletedAt: &now,
Message: fmt.Sprintf("Run %s ended with status: %s", s.RunID, status.Status),
}, nil
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the TFC run ends in an errored/canceled/discarded state, this branch returns a non-nil CompletedAt with a nil error. The deploymentplanresult controller treats any nil error + non-nil CompletedAt as a successful completion and will persist Status=completed, which misclassifies failed plans. Return a non-nil error here (including run ID/status) so the controller records Status=errored (or introduce an explicit failure status in PlanResult and handle it in the controller).

Suggested change
now := time.Now()
return &types.PlanResult{
CompletedAt: &now,
Message: fmt.Sprintf("Run %s ended with status: %s", s.RunID, status.Status),
}, nil
return nil, fmt.Errorf("run %s ended with status: %s", s.RunID, status.Status)

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +149
now := time.Now()
return &types.PlanResult{
CompletedAt: &now,
Message: fmt.Sprintf(
"Run %s timed out after %d polls (%s elapsed), last status: %s",
s.RunID, s.PollCount, time.Since(*s.FirstPolled).Round(time.Second), status.Status,
),
}, nil
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout path also returns CompletedAt with a nil error, so the controller will persist Status=completed even though this is effectively a plan failure/timeout. Consider returning an error (e.g., include elapsed time/poll count/last status) so timeouts are recorded as Status=errored and are visible as failures to callers.

Suggested change
now := time.Now()
return &types.PlanResult{
CompletedAt: &now,
Message: fmt.Sprintf(
"Run %s timed out after %d polls (%s elapsed), last status: %s",
s.RunID, s.PollCount, time.Since(*s.FirstPolled).Round(time.Second), status.Status,
),
}, nil
elapsed := time.Since(*s.FirstPolled).Round(time.Second)
return nil, fmt.Errorf(
"run %s timed out after %d polls (%s elapsed), last status: %s",
s.RunID, s.PollCount, elapsed, status.Status,
)

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +174
func (p *TFCPlanner) completePlan(
ctx context.Context,
cfg *tfeConfig,
status *RunStatus,
) (*types.PlanResult, error) {
planJSON, err := p.runner.ReadPlanJSON(ctx, cfg, status.PlanID)
if err != nil {
return nil, fmt.Errorf("read plan JSON: %w", err)
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completePlan assumes status.PlanID is populated whenever status.IsFinished is true. In ReadRunStatus, PlanID is only set when run.Plan != nil, so it's possible to reach here with an empty PlanID (e.g., unexpected API response), which will produce a confusing downstream error. Add an explicit check for empty PlanID and return a clear error before calling ReadPlanJSON.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go`:
- Around line 14-17: Add Go doc comments for the exported methods: Setup,
CreateSpeculativeRun, ReadRunStatus, and ReadPlanJSON; for each method (e.g.,
GoWorkspaceSetup.Setup, GoWorkspaceSetup.CreateSpeculativeRun,
GoWorkspaceSetup.ReadRunStatus, GoWorkspaceSetup.ReadPlanJSON) add a concise
comment immediately above the method declaration describing what the method
does, its important parameters and return values, and any non-obvious behavior
or side effects (keep each comment one or two sentences and follow Go doc
comment style starting with the method name).
- Around line 99-104: The switch on run.Status currently treats
tfe.RunPlannedAndFinished as finished and
tfe.RunErrored/RunCanceled/RunDiscarded as errored but omits
tfe.RunPolicySoftFailed; update the branch handling in the switch that sets
status.IsFinished and status.IsErrored (the block inspecting run.Status and
assigning status.IsFinished / status.IsErrored) to include
tfe.RunPolicySoftFailed in the errored-terminal case so that runs with a
Sentinel soft-fail are marked terminal immediately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06eacf06-4bb1-41dd-ac73-cd0e2b22afcf

📥 Commits

Reviewing files that changed from the base of the PR and between 62bede4 and f0ed309.

📒 Files selected for processing (3)
  • apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan.go
  • apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go
  • apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go
✅ Files skipped from review due to trivial changes (1)
  • apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go`:
- Around line 102-107: The switch handling run.Status in tfe_plan_client.go
currently marks only RunPlannedAndFinished as terminal and treats RunErrored,
RunCanceled, RunDiscarded, and RunPolicySoftFailed as errored; add
tfe.RunForceCanceled (Terraform Cloud's force_canceled status) to the error
branch so that when run.Status == tfe.RunForceCanceled you set status.IsErrored
= true (same branch as RunErrored/RunCanceled/RunDiscarded/RunPolicySoftFailed),
ensuring polling stops immediately for force-canceled runs; update any related
comments/tests referencing terminal statuses if present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 66136e89-0530-4234-80c2-a2104d60d727

📥 Commits

Reviewing files that changed from the base of the PR and between f0ed309 and 31a8ec8.

📒 Files selected for processing (2)
  • apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go
  • apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_test.go

Comment on lines +102 to +107
switch run.Status {
case tfe.RunPlannedAndFinished:
status.IsFinished = true
case tfe.RunErrored, tfe.RunCanceled, tfe.RunDiscarded, tfe.RunPolicySoftFailed:
status.IsErrored = true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Terraform Cloud Runs API, is "force_canceled" a possible run status, and should it be treated as a terminal non-success state for speculative (plan-only) runs?

💡 Result:

Yes. The Terraform Cloud (HCP Terraform) Runs API documents force_canceled as a valid data.attributes.status value (“A workspace admin forcefully canceled the run.”). It indicates the run was terminated due to an administrative force-cancel, not a successful completion. [1]

For speculative (plan-only) runs, the documented successful terminal status is planned_and_finished (“The run is completed… This is a final state.”). A force_canceled run should therefore be treated as a terminal, non-success outcome when evaluating plan-only run results. [1]

Source
[1] Terraform Cloud Runs API docs (run statuses; plan-only terminal status; cancel/force-cancel behavior). (developer.hashicorp.com)

Citations:


🏁 Script executed:

cat -n apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go | sed -n '95,115p'

Repository: ctrlplanedev/ctrlplane

Length of output: 798


🏁 Script executed:

rg "force_canceled|RunStatus" apps/workspace-engine/pkg/jobagents/terraformcloud/ -A 2 -B 2

Repository: ctrlplanedev/ctrlplane

Length of output: 8901


🏁 Script executed:

rg "import.*tfe\|from.*tfe" apps/workspace-engine/pkg/jobagents/terraformcloud/ -A 3

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

head -30 apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go

Repository: ctrlplanedev/ctrlplane

Length of output: 784


🏁 Script executed:

rg "tfe\." apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go | head -20

Repository: ctrlplanedev/ctrlplane

Length of output: 285


🏁 Script executed:

rg "RunCanceled|RunErrored|RunPlannedAndFinished|RunDiscarded|RunPolicySoftFailed" apps/workspace-engine/pkg/jobagents/terraformcloud/ -B 2 -A 2

Repository: ctrlplanedev/ctrlplane

Length of output: 1738


🏁 Script executed:

rg "RunForceCanceled|force_canceled" apps/workspace-engine/ -i

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

web_search
hashicorp go-tfe library RunForceCanceled constant or force_canceled status

Repository: ctrlplanedev/ctrlplane

Length of output: 164


🌐 Web query:

hashicorp go-tfe RunForceCanceled constant API documentation

💡 Result:

force_canceled is a valid run state in the HCP Terraform / Terraform Enterprise API (data.attributes.status) and means “a workspace admin forcefully canceled the run.” (developer.hashicorp.com)

However, in the current github.com/hashicorp/go-tfe package, there is no RunForceCanceled (or RunForceCanceled-like) constant in the RunStatus constants list; the exported RunStatus constants include RunCanceled, RunErrored, etc., but not force_canceled. (pkg.go.dev)

What to do in Go

  • Compare against the raw string: status == "force_canceled", or
  • Define your own constant in your codebase:
    const RunForceCanceled tfe.RunStatus = "force_canceled"

References (numbered)

  • [1] HCP Terraform Runs API docs (“Run States” includes force_canceled). (developer.hashicorp.com)
  • [2] go-tfe RunStatus constants list (does not include force_canceled). (pkg.go.dev)

Citations:


🏁 Script executed:

rg "const.*Run" apps/workspace-engine/pkg/jobagents/terraformcloud/ -B 2 -A 2

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

rg "type RunStatus\|Status.*tfe\.RunStatus" apps/workspace-engine/pkg/jobagents/terraformcloud/ -A 3

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

rg "\.Status\s*=" apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go -B 3 -A 1

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

python3 << 'EOF'
# Check the go-tfe package to understand RunStatus type
import subprocess
result = subprocess.run(['rg', 'github.com/hashicorp/go-tfe', 'go.mod', 'go.sum'], 
                       capture_output=True, text=True, cwd='/tmp')
if result.returncode == 0:
    print(result.stdout)
else:
    print("Not found in go files")
EOF

Repository: ctrlplanedev/ctrlplane

Length of output: 87


🏁 Script executed:

cat -n apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go | sed -n '80,110p'

Repository: ctrlplanedev/ctrlplane

Length of output: 1039


🏁 Script executed:

rg "run\.Status" apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go -B 5 -A 5

Repository: ctrlplanedev/ctrlplane

Length of output: 607


Add force_canceled to terminal error statuses in the run status switch.

The switch at line 102 does not handle Terraform Cloud's force_canceled run status. When a run is administratively force-canceled, neither IsFinished nor IsErrored will be set, causing polling to continue until timeout instead of terminating immediately as an error state.

Proposed fix
 switch run.Status {
 case tfe.RunPlannedAndFinished:
 	status.IsFinished = true
-case tfe.RunErrored, tfe.RunCanceled, tfe.RunDiscarded, tfe.RunPolicySoftFailed:
+case tfe.RunErrored, tfe.RunCanceled, tfe.RunDiscarded, tfe.RunPolicySoftFailed, tfe.RunStatus("force_canceled"):
 	status.IsErrored = true
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/workspace-engine/pkg/jobagents/terraformcloud/tfe_plan_client.go` around
lines 102 - 107, The switch handling run.Status in tfe_plan_client.go currently
marks only RunPlannedAndFinished as terminal and treats RunErrored, RunCanceled,
RunDiscarded, and RunPolicySoftFailed as errored; add tfe.RunForceCanceled
(Terraform Cloud's force_canceled status) to the error branch so that when
run.Status == tfe.RunForceCanceled you set status.IsErrored = true (same branch
as RunErrored/RunCanceled/RunDiscarded/RunPolicySoftFailed), ensuring polling
stops immediately for force-canceled runs; update any related comments/tests
referencing terminal statuses if present.

@adityachoudhari26 adityachoudhari26 merged commit 54437cb into main Apr 9, 2026
12 checks passed
@adityachoudhari26 adityachoudhari26 deleted the tfe-planner branch April 9, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: implement plannable interface for tfe agent

2 participants