Skip to content

feat(tfe-job-agent): webhook-based TFC job agent#847

Merged
zacharyblasczyk merged 9 commits intomainfrom
tfc-workspace-job-agent-v2
Mar 20, 2026
Merged

feat(tfe-job-agent): webhook-based TFC job agent#847
zacharyblasczyk merged 9 commits intomainfrom
tfc-workspace-job-agent-v2

Conversation

@zacharyblasczyk
Copy link
Copy Markdown
Member

@zacharyblasczyk zacharyblasczyk commented Mar 18, 2026

Replace polling-based TFE dispatcher with fire-and-forget webhook model:

  • Go dispatcher: upsert workspace, sync variables, ensure notification config, create run, return immediately
  • API webhook route: POST /api/tfe/webhook with HMAC-SHA512 verification
  • Status mapping: TFC notification triggers → ctrlplane job statuses
  • Job correlation: extract job UUID from TFC run message (same pattern as GitHub webhook handler)
  • Links/metadata set by webhook handler from TFC payload data
  • OpenAPI schema: add webhookUrl (required) and triggerRunOnChange (optional) to TerraformCloudJobAgentConfig
  • Fix workspace-engine build: main.go → . in package.json

Tested end-to-end locally against TFC via smee.io webhook tunnel.

Summary by CodeRabbit

  • New Features

    • Terraform Cloud added as a job agent: workspace sync, variable reconciliation, optional run creation, and webhook-driven status updates.
    • New API webhook endpoint to receive and validate Terraform Cloud notifications.
  • Documentation

    • Updated Terraform Cloud integration guide with webhook setup, status mapping, examples, and troubleshooting.
  • Configuration

    • New optional env var TFE_WEBHOOK_SECRET for webhook signature validation.
  • Tests

    • Added tests for webhook handling, signature verification, trigger→status mapping, and Terraform Cloud agent behavior.

Replace polling-based TFE dispatcher with fire-and-forget webhook model:

- Go dispatcher: upsert workspace, sync variables, ensure notification
  config, create run, return immediately
- API webhook route: POST /api/tfe/webhook with HMAC-SHA512 verification
- Status mapping: TFC notification triggers → ctrlplane job statuses
- Job correlation: extract job UUID from TFC run message (same pattern
  as GitHub webhook handler)
- Links/metadata set by webhook handler from TFC payload data
- OpenAPI schema: add webhookUrl (required) and triggerRunOnChange
  (optional) to TerraformCloudJobAgentConfig
- Fix workspace-engine build: main.go → . in package.json

Tested end-to-end locally against TFC via smee.io webhook tunnel.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 303e5483-aaf8-4b37-930b-c692e4024564

📥 Commits

Reviewing files that changed from the base of the PR and between b6fdc9b and 6f3c3ea.

📒 Files selected for processing (1)
  • apps/workspace-engine/svc/controllers/jobdispatch/controller.go

📝 Walkthrough

Walkthrough

Adds Terraform Cloud webhook handling and a Terraform Cloud job agent: new POST /api/tfe/webhook with HMAC-SHA512 verification, notification→JobStatus mapping and job/metadata updates; workspace-engine gains a TFE job agent (workspace upsert, variable sync, optional run creation, webhook config). Tests and docs updated.

Changes

Cohort / File(s) Summary
API: TFE webhook
/.env.compose, apps/api/src/config.ts, apps/api/src/routes/tfe/index.ts, apps/api/src/routes/tfe/run_notification.ts, apps/api/src/routes/tfe/__tests__/webhook_router.test.ts, apps/api/src/routes/tfe/__tests__/run_notification.test.ts, apps/api/src/server.ts
Adds POST /api/tfe/webhook router with HMAC-SHA512 signature verification using TFE_WEBHOOK_SECRET; validates and routes notifications, maps TFC triggers to internal JobStatus, updates schema.job/schema.jobMetadata, and enqueues reconciliation. Includes router and mapping tests covering signature, payload, and error cases.
Workspace-engine: TFE agent
apps/workspace-engine/svc/controllers/jobdispatch/controller.go, apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/*
Introduces terraformcloud job agent: config parsing, TFC client creation, template-driven workspace upsert, variable reconciliation (create/update/delete), ensure notification config (webhook), optional run triggering via triggerRunOnChange, job status updates through a Setter interface, and unit tests for templates/config parsing.
Schema & build
apps/workspace-engine/oapi/spec/schemas/jobs.jsonnet, apps/workspace-engine/package.json
Adds webhookUrl (required) and triggerRunOnChange (boolean, default true) to TerraformCloudJobAgentConfig schema. Adjusts workspace-engine build script to go build -o ./bin/workspace-engine ..
Documentation
docs/integrations/job-agents/terraform-cloud.mdx
Reworks integration docs for webhook-driven status tracking: documents webhookUrl, TFE_WEBHOOK_SECRET, notification configuration, trigger→status mapping, and troubleshooting; replaces polling/verification guidance with webhook flow and examples.

Sequence Diagram(s)

sequenceDiagram
    participant TFC as Terraform Cloud
    participant API as Ctrlplane API (/api/tfe/webhook)
    participant Verify as verifySignature
    participant Handler as handleRunNotification
    participant DB as Database (schema.job / schema.jobMetadata)
    participant Queue as Reconciliation Queue

    TFC->>API: POST /webhook (notifications + x-tfe-notification-signature)
    API->>Verify: compute HMAC-SHA512 & timing-safe compare
    alt valid signature
        Verify-->>API: ok
        API->>Handler: forward payload (notifications)
        alt notifications non-empty
            Handler->>DB: update job (status, externalId, timestamps, message)
            Handler->>DB: upsert jobMetadata (links, run_url)
            Handler->>DB: resolve workspaceId via joins
            alt workspaceId found
                Handler->>Queue: enqueueAllReleaseTargetsDesiredVersion(workspaceId)
            end
        else notifications empty
            Handler-->>API: no-op (200)
        end
        API-->>TFC: 200 OK
    else invalid/missing signature
        Verify-->>API: fail
        API-->>TFC: 401 Unauthorized
    end
    alt handler error
        API-->>TFC: 500 Internal Server Error (error message)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • jsbroks

Poem

🐰 I sniff the secret in the log,
A webhook hops, I check the cog,
Runs report, I map their cheer or rue,
I sync the vars and nudge the queue,
Little rabbit, big deploy — hooray, code!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(tfe-job-agent): webhook-based TFC job agent' accurately summarizes the main change: implementing a webhook-based Terraform Cloud job agent dispatcher, replacing the previous polling approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tfc-workspace-job-agent-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 18, 2026

CLA assistant check
All committers have signed the CLA.

@zacharyblasczyk zacharyblasczyk marked this pull request as ready for review March 18, 2026 20:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/config.go (1)

47-48: String parsing for triggerRunOnChange only recognizes lowercase "false".

The current logic val != "false" means values like "FALSE", "False", or "0" will be interpreted as true. Consider case-insensitive comparison or using strconv.ParseBool for more robust boolean parsing.

♻️ Suggested improvement
+import "strings"
+
 case string:
-	triggerRunOnChange = val != "false"
+	triggerRunOnChange = strings.ToLower(val) != "false"

Or for full boolean string support:

import "strconv"

case string:
	parsed, err := strconv.ParseBool(val)
	if err == nil {
		triggerRunOnChange = parsed
	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/config.go`
around lines 47 - 48, The string branch that sets triggerRunOnChange currently
uses a case-sensitive comparison (val != "false"), so values like "False" or "0"
are mis-parsed; update the case string branch to use strconv.ParseBool to
perform robust, case-insensitive boolean parsing and only set triggerRunOnChange
when ParseBool returns no error, and add the strconv import; locate the code
around the case string handling of triggerRunOnChange to apply this change.
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go (1)

74-74: os.Getenv called on every dispatch.

Reading TFE_WEBHOOK_SECRET from the environment on each dispatch works but adds minor overhead. If runtime secret rotation isn't required, consider reading it once at construction time. Current approach is valid if dynamic secret updates are desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go`
at line 74, The code currently calls os.Getenv("TFE_WEBHOOK_SECRET") on every
dispatch (webhookSecret variable); change this to read the secret once during
the TFE dispatcher/controller construction and store it on the struct (e.g., add
a field on the TFE client/dispatcher type), then have the dispatch handler use
that stored value instead of calling os.Getenv each time; keep the current
runtime behavior only if you intentionally want dynamic secret rotation.
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe_test.go (1)

71-82: Consider adding a test case for triggerRunOnChange: "true" as a string.

The tests cover "false" as a string but not "true". Given the parsing logic treats any non-"false" string as true, adding explicit coverage for the "true" string case would improve test clarity.

🧪 Suggested test case
t.Run("string true", func(t *testing.T) {
	cfg := oapi.JobAgentConfig{
		"address":            "https://app.terraform.io",
		"token":              "t",
		"organization":       "o",
		"template":           "t",
		"triggerRunOnChange": "true",
	}
	parsed, err := parseJobAgentConfig(cfg)
	require.NoError(t, err)
	assert.True(t, parsed.triggerRunOnChange)
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe_test.go`
around lines 71 - 82, Add a test case in tfe_test.go that mirrors the existing
"string false" test but sets "triggerRunOnChange" to the string "true"; call
parseJobAgentConfig(cfg), assert no error with require.NoError, and assert that
parsed.triggerRunOnChange is true (suggested subtest name "string true") so
parseJobAgentConfig's string-to-bool handling is explicitly covered.
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go (1)

318-319: Category validation error silently ignored in toCreateOptions/toUpdateOptions.

The categoryType() error is discarded with _. While syncVariables validates categories before calling these methods, direct use of toCreateOptions/toUpdateOptions could silently use an empty category. Consider returning an error or validating at the struct level.

Also applies to: 330-331

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`
around lines 318 - 319, The methods VariableTemplate.toCreateOptions and
toUpdateOptions currently ignore the error returned by categoryType(), which can
mask invalid categories; change their signatures to return
(tfe.VariableCreateOptions, error), call categoryType() and if it returns an
error return that error instead of discarding it, and update callers (e.g.,
syncVariables) to handle the error; alternatively, perform validation earlier
when constructing VariableTemplate (validate via categoryType in the constructor
or a new Validate method) and have toCreateOptions/toUpdateOptions assume a
validated struct—reference the methods VariableTemplate.toCreateOptions,
VariableTemplate.toUpdateOptions, categoryType, and syncVariables when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/api/src/config.ts`:
- Line 35: The TFE_WEBHOOK_SECRET environment schema currently allows an empty
string which becomes a valid HMAC key; update the zod schema for
TFE_WEBHOOK_SECRET in config.ts to normalize blank strings to undefined (e.g.,
use z.preprocess((val) => val === "" ? undefined : val, z.string().optional()))
so that an empty env value disables the webhook feature consistently with the
router checks (routes/tfe index uses undefined/null guarding).

In `@apps/api/src/routes/tfe/index.ts`:
- Around line 20-24: The HMAC is being computed from JSON.stringify(req.body)
which can differ from the original payload; capture the raw request bytes for
the /api/tfe route (e.g., using express.json's verify option to save the raw
buffer to req.rawBody or using express.raw for this webhook route) and then use
that raw buffer when calling crypto.createHmac(...).update(...).digest(...)
instead of JSON.stringify(req.body); update the code that computes expected (the
createHmac/update call that currently uses body and secret) to use the saved raw
body (e.g., req.rawBody) and keep the rest of the signature comparison logic the
same.

In `@apps/api/src/routes/tfe/run_notification.ts`:
- Around line 113-114: The call to enqueueAllReleaseTargetsDesiredVersion in
run_notification.ts is currently fire-and-forget and can cause unhandled promise
rejections; change the invocation to await the promise so failures propagate to
the async handler (i.e., replace the direct call to
enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId) with an awaited
call: await enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId)) so
any queue write errors are properly caught by the surrounding async flow.

In `@apps/workspace-engine/oapi/spec/schemas/jobs.jsonnet`:
- Around line 147-154: The schema currently lists webhookUrl as required in
apps/workspace-engine/oapi/spec/schemas/jobs.jsonnet but runtime/validation
treat webhookUrl as optional; to fix, either remove "webhookUrl" from the
required array in that schema (update the required:
['address','organization','token','template','webhookUrl'] to exclude
webhookUrl) or add a DB migration that backfills a safe default for all existing
TFC job configs before making webhookUrl required (write a migration that
updates persisted TFC agent configs to include a non-null webhookUrl value,
ensure the migration runs in this PR, and update any tests and validation logic
to reflect the chosen approach so schema, runtime validation, and persisted data
remain consistent).

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`:
- Around line 173-176: NotificationConfigurations.List returns paginated
results, so change the call in workspace.go to iterate through all pages and
accumulate configurations (similar to how syncVariables handles pagination)
instead of only using the first page; repeatedly call
NotificationConfigurations.List with the appropriate pagination/options until no
more pages, search the combined list for the "ctrlplane-webhook" config to avoid
creating duplicates, and return/use the aggregated configs variable throughout
the surrounding logic that currently assumes a single-page response.
- Around line 102-105: The call to client.Variables.List in workspace.go only
fetches the first page of variables so workspaces with >20 variables aren't
fully synced; update the logic that calls client.Variables.List (the variable
listing block) to iterate through all pages returned by the API (use the
pager/next page mechanism returned by the SDK, calling List repeatedly or using
the provided pager helper) and accumulate variables into existingVars before
proceeding with update/delete logic; ensure you stop when there are no more
pages and propagate any errors encountered during pagination.

---

Nitpick comments:
In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/config.go`:
- Around line 47-48: The string branch that sets triggerRunOnChange currently
uses a case-sensitive comparison (val != "false"), so values like "False" or "0"
are mis-parsed; update the case string branch to use strconv.ParseBool to
perform robust, case-insensitive boolean parsing and only set triggerRunOnChange
when ParseBool returns no error, and add the strconv import; locate the code
around the case string handling of triggerRunOnChange to apply this change.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe_test.go`:
- Around line 71-82: Add a test case in tfe_test.go that mirrors the existing
"string false" test but sets "triggerRunOnChange" to the string "true"; call
parseJobAgentConfig(cfg), assert no error with require.NoError, and assert that
parsed.triggerRunOnChange is true (suggested subtest name "string true") so
parseJobAgentConfig's string-to-bool handling is explicitly covered.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go`:
- Line 74: The code currently calls os.Getenv("TFE_WEBHOOK_SECRET") on every
dispatch (webhookSecret variable); change this to read the secret once during
the TFE dispatcher/controller construction and store it on the struct (e.g., add
a field on the TFE client/dispatcher type), then have the dispatch handler use
that stored value instead of calling os.Getenv each time; keep the current
runtime behavior only if you intentionally want dynamic secret rotation.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`:
- Around line 318-319: The methods VariableTemplate.toCreateOptions and
toUpdateOptions currently ignore the error returned by categoryType(), which can
mask invalid categories; change their signatures to return
(tfe.VariableCreateOptions, error), call categoryType() and if it returns an
error return that error instead of discarding it, and update callers (e.g.,
syncVariables) to handle the error; alternatively, perform validation earlier
when constructing VariableTemplate (validate via categoryType in the constructor
or a new Validate method) and have toCreateOptions/toUpdateOptions assume a
validated struct—reference the methods VariableTemplate.toCreateOptions,
VariableTemplate.toUpdateOptions, categoryType, and syncVariables when making
the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 754d8290-955a-43bf-942b-8c2b1d6111a8

📥 Commits

Reviewing files that changed from the base of the PR and between c555550 and 75fa9b9.

📒 Files selected for processing (15)
  • .env.compose
  • apps/api/src/config.ts
  • apps/api/src/routes/tfe/__tests__/run_notification.test.ts
  • apps/api/src/routes/tfe/__tests__/webhook_router.test.ts
  • apps/api/src/routes/tfe/index.ts
  • apps/api/src/routes/tfe/run_notification.ts
  • apps/api/src/server.ts
  • apps/workspace-engine/oapi/spec/schemas/jobs.jsonnet
  • apps/workspace-engine/package.json
  • apps/workspace-engine/svc/controllers/jobdispatch/controller.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/config.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe_test.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go
  • docs/integrations/job-agents/terraform-cloud.mdx

GITHUB_BOT_APP_ID: z.string().optional(),
GITHUB_BOT_PRIVATE_KEY: z.string().optional(),
GITHUB_WEBHOOK_SECRET: z.string().optional(),
TFE_WEBHOOK_SECRET: z.string().optional(),
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the config.ts file around line 35
fd -t f "config.ts" apps/api/src | head -5

Repository: ctrlplanedev/ctrlplane

Length of output: 88


🏁 Script executed:

# Then read the specific file with context around line 35
cat -n apps/api/src/config.ts | sed -n '30,50p'

Repository: ctrlplanedev/ctrlplane

Length of output: 815


🏁 Script executed:

# Check the routes/tfe/index.ts file to see how the secret is used
find apps/api/src -name "index.ts" -path "*/routes/tfe/*" -type f

Repository: ctrlplanedev/ctrlplane

Length of output: 98


🏁 Script executed:

# Check .env.compose for context
find . -name ".env*compose*" -type f | head -10

Repository: ctrlplanedev/ctrlplane

Length of output: 80


🏁 Script executed:

# Read the TFE route file to see how the secret is validated
cat -n apps/api/src/routes/tfe/index.ts

Repository: ctrlplanedev/ctrlplane

Length of output: 1919


🏁 Script executed:

# Read .env.compose to see how TFE_WEBHOOK_SECRET is configured
cat -n ./.env.compose

Repository: ctrlplanedev/ctrlplane

Length of output: 1322


Normalize blank webhook secrets to undefined.

The schema accepts TFE_WEBHOOK_SECRET= as a valid empty string, and the router only guards against null/undefined (line 15 of routes/tfe/index.ts). A blank env entry becomes a valid HMAC key instead of disabling the feature, allowing anyone to compute a valid signature using an empty secret.

Suggested fix
-    TFE_WEBHOOK_SECRET: z.string().optional(),
+    TFE_WEBHOOK_SECRET: z.preprocess(
+      (value) =>
+        typeof value === "string" && value.trim() === "" ? undefined : value,
+      z.string().min(1).optional(),
+    ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TFE_WEBHOOK_SECRET: z.string().optional(),
TFE_WEBHOOK_SECRET: z.preprocess(
(value) =>
typeof value === "string" && value.trim() === "" ? undefined : value,
z.string().min(1).optional(),
),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/config.ts` at line 35, The TFE_WEBHOOK_SECRET environment schema
currently allows an empty string which becomes a valid HMAC key; update the zod
schema for TFE_WEBHOOK_SECRET in config.ts to normalize blank strings to
undefined (e.g., use z.preprocess((val) => val === "" ? undefined : val,
z.string().optional())) so that an empty env value disables the webhook feature
consistently with the router checks (routes/tfe index uses undefined/null
guarding).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also the exact same pattern as GITHUB_WEBHOOK_SECRET on line 34. Fixing one without the other would be inconsistent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zacharyblasczyk, that's a fair point — the same empty-string concern applies to GITHUB_WEBHOOK_SECRET as well. For true consistency, the fix should be applied to both:

-    GITHUB_WEBHOOK_SECRET: z.string().optional(),
-    TFE_WEBHOOK_SECRET: z.string().optional(),
+    GITHUB_WEBHOOK_SECRET: z.preprocess(
+      (value) =>
+        typeof value === "string" && value.trim() === "" ? undefined : value,
+      z.string().min(1).optional(),
+    ),
+    TFE_WEBHOOK_SECRET: z.preprocess(
+      (value) =>
+        typeof value === "string" && value.trim() === "" ? undefined : value,
+      z.string().min(1).optional(),
+    ),

That said, if the project's convention is to leave webhook secrets as plain z.string().optional() and rely on runtime guards in the route handlers instead, I understand treating both consistently with the existing pattern. The risk is low as long as developers know not to set these variables to empty strings in production.

( ͡° ͜ʖ ͡°) 🐇

Comment on lines +113 to +114
if (result?.workspaceId != null)
enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId);
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n apps/api/src/routes/tfe/run_notification.ts | head -150

Repository: ctrlplanedev/ctrlplane

Length of output: 4556


🏁 Script executed:

rg "enqueueAllReleaseTargetsDesiredVersion" --type ts -B 5 -A 10

Repository: ctrlplanedev/ctrlplane

Length of output: 19957


🏁 Script executed:

fd "version-manager" -e ts -o | head -5

Repository: ctrlplanedev/ctrlplane

Length of output: 182


🏁 Script executed:

sed -n '37,115p' apps/api/src/routes/tfe/run_notification.ts

Repository: ctrlplanedev/ctrlplane

Length of output: 2392


Await the desired-version enqueue.

This promise is currently fire-and-forget, so a queue write failure will result in an unhandled rejection. Since the handler is async, await it here:

Suggested fix
  if (result?.workspaceId != null)
-   enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId);
+   await enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (result?.workspaceId != null)
enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId);
if (result?.workspaceId != null)
await enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/routes/tfe/run_notification.ts` around lines 113 - 114, The call
to enqueueAllReleaseTargetsDesiredVersion in run_notification.ts is currently
fire-and-forget and can cause unhandled promise rejections; change the
invocation to await the promise so failures propagate to the async handler
(i.e., replace the direct call to enqueueAllReleaseTargetsDesiredVersion(db,
result.workspaceId) with an awaited call: await
enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId)) so any queue
write errors are properly caught by the surrounding async flow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional — it's an enqueue operation (write to a queue table) that's designed to be best-effort. If it fails, the reconciler will pick it up on the next cycle anyway. Adding await here would make the webhook response wait on the enqueue write, and if it throws, the webhook would return 500 to TFC causing unnecessary retries for a non-critical side effect.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zacharyblasczyk, that's a reasonable design choice — keeping the webhook response fast and not penalizing TFC for a non-critical side effect makes sense.

That said, there's still a risk of an unhandled promise rejection if the enqueue fails, which can crash or warn the Node.js process even though the behavior is intentional. The idiomatic fix for intentional fire-and-forget is to explicitly mark it with void and attach a .catch() to swallow/log the error:

if (result?.workspaceId != null)
  void enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId).catch(
    (err) => logger.warn("Failed to enqueue desired version (best-effort)", { err }),
  );

This way:

  • The webhook handler returns immediately (no awaiting).
  • No 500 is ever returned to TFC for this side effect.
  • The promise rejection is explicitly handled, preventing unhandled rejection warnings/crashes.
  • The warning log gives you observability if the enqueue fails.

✏️ Learnings added
Learnt from: zacharyblasczyk
Repo: ctrlplanedev/ctrlplane PR: 847
File: apps/api/src/routes/tfe/run_notification.ts:115-116
Timestamp: 2026-03-20T16:24:22.839Z
Learning: In apps/api/src/routes/tfe/run_notification.ts, the call to `enqueueAllReleaseTargetsDesiredVersion` at the end of `handleRunNotification` is intentionally fire-and-forget. The enqueue is a best-effort write to a queue table; if it fails, the reconciler picks it up on the next cycle. Awaiting it would delay the webhook response and could return 500 to TFC causing unnecessary retries. The preferred pattern here is `void enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId).catch(err => logger.warn(...))` to prevent unhandled rejections while keeping non-blocking behavior.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 627
File: apps/event-worker/src/workers/job-update/trigger-dependendent-targets.ts:0-0
Timestamp: 2025-07-23T22:42:20.933Z
Learning: In apps/event-worker/src/workers/job-update/trigger-dependendent-targets.ts, when implementing error handling for the triggerDependentTargets function, only a try-catch wrapper is needed around the entire function body. Additional null checks after takeFirst() calls are unnecessary since takeFirst throws an error if no record is found.

Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 501
File: apps/webservice/src/app/api/v1/resources/route.ts:69-72
Timestamp: 2025-04-22T04:35:36.016Z
Learning: In Node.js applications, non-awaited Promises that reject after a function returns won't be caught by the outer try-catch block, potentially causing unhandled rejections. Either explicitly mark them with `void` and add `.catch()` handlers, or use `await` to ensure proper error handling.

Comment on lines +173 to +176
configs, err := client.NotificationConfigurations.List(ctx, workspaceID, nil)
if err != nil {
return fmt.Errorf("failed to list notification configs: %w", err)
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pagination not handled when listing notification configurations.

Similar to syncVariables, NotificationConfigurations.List() returns paginated results. If a workspace has many notification configs, the ctrlplane-webhook config might not be found on the first page, causing duplicate configs to be created.

🔧 Suggested fix: paginate through all configs
 func ensureNotificationConfig(
 	ctx context.Context,
 	client *tfe.Client,
 	workspaceID, webhookURL, webhookSecret string,
 ) error {
-	configs, err := client.NotificationConfigurations.List(ctx, workspaceID, nil)
-	if err != nil {
-		return fmt.Errorf("failed to list notification configs: %w", err)
-	}
-
-	for _, cfg := range configs.Items {
+	opts := &tfe.NotificationConfigurationListOptions{}
+	for {
+		configs, err := client.NotificationConfigurations.List(ctx, workspaceID, opts)
+		if err != nil {
+			return fmt.Errorf("failed to list notification configs: %w", err)
+		}
+		for _, cfg := range configs.Items {
+			if cfg.Name != notificationConfigName {
+				continue
+			}
+			// ... existing matching logic
+		}
+		if configs.NextPage == 0 {
+			break
+		}
+		opts.PageNumber = configs.NextPage
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`
around lines 173 - 176, NotificationConfigurations.List returns paginated
results, so change the call in workspace.go to iterate through all pages and
accumulate configurations (similar to how syncVariables handles pagination)
instead of only using the first page; repeatedly call
NotificationConfigurations.List with the appropriate pagination/options until no
more pages, search the combined list for the "ctrlplane-webhook" config to avoid
creating duplicates, and return/use the aggregated configs variable throughout
the surrounding logic that currently assumes a single-page response.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TFC workspace has at most a handful of notification configs — the UI itself only supports a small number. You'd need 20+ notification configurations on a single workspace to hit a second page, which is unrealistic. Not worth the code complexity.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zacharyblasczyk, that's a fair point — TFC's own UI enforces a low practical ceiling on notification configs per workspace, making pagination a non-issue here. Thanks for the context!

ʕ •ᴥ•ʔ


✏️ Learnings added
Learnt from: zacharyblasczyk
Repo: ctrlplanedev/ctrlplane PR: 847
File: apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go:179-182
Timestamp: 2026-03-20T17:07:42.750Z
Learning: In `apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`, `ensureNotificationConfig` intentionally does not paginate `NotificationConfigurations.List`. TFC workspaces support only a handful of notification configs (the UI itself enforces a small limit), so a workspace would never have 20+ configs needed to trigger a second page. Pagination would add unnecessary complexity.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: zacharyblasczyk
Repo: ctrlplanedev/ctrlplane PR: 847
File: apps/api/src/routes/tfe/run_notification.ts:115-116
Timestamp: 2026-03-20T16:24:22.839Z
Learning: In apps/api/src/routes/tfe/run_notification.ts, the call to `enqueueAllReleaseTargetsDesiredVersion` at the end of `handleRunNotification` is intentionally fire-and-forget. The enqueue is a best-effort write to a queue table; if it fails, the reconciler picks it up on the next cycle. Awaiting it would delay the webhook response and could return 500 to TFC causing unnecessary retries. The preferred pattern here is `void enqueueAllReleaseTargetsDesiredVersion(db, result.workspaceId).catch(err => logger.warn(...))` to prevent unhandled rejections while keeping non-blocking behavior.

Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

- Use COALESCE for startedAt so run:applying doesn't overwrite the
  original start time set by run:planning
- Paginate variable listing to handle workspaces with >20 variables
- Always update notification config (URL + token) so secret rotation
  propagates without manual intervention
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go (2)

52-52: Trim the obvious templateWorkspace comment.

Line 52 just restates the function name. Either remove it or use the comment to capture the non-obvious contract here.

Based on learnings, "Do not add comments that simply restate what the code does".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`
at line 52, Remove the redundant comment that merely repeats the function name
for templateWorkspace; either delete it or replace it with a concise comment
describing the non-obvious contract of templateWorkspace (e.g., the expected
shape of dispatch.Context it uses, which template variables must be present,
error behavior, and any side-effects such as escaping or trimming of values) so
readers understand preconditions/outputs rather than restating the name.

15-50: Add doc comments for the exported template types.

VCSRepoTemplate, WorkspaceTemplate, and VariableTemplate are exported, so they should each carry a brief package comment describing how they fit into the Terraform Cloud template model.

As per coding guidelines, "document exported functions/types/methods".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`
around lines 15 - 50, Add Go doc comments above each exported type
(VCSRepoTemplate, WorkspaceTemplate, VariableTemplate) describing their role in
the Terraform Cloud workspace template model and how they are used (e.g.,
VCSRepoTemplate represents VCS connection details for a workspace;
WorkspaceTemplate defines workspace-level settings such as execution mode,
triggers, VCSRepo and Variables; VariableTemplate describes a workspace variable
including category/HCL/sensitive flags). Ensure each comment starts with the
type name (e.g., "VCSRepoTemplate ..." ) and briefly mention important fields or
semantics so godoc users understand intent and usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`:
- Around line 250-252: toUpdateOptions() currently omits setting the
WorkspaceUpdateOptions.Project field so workspace project changes never
propagate; mirror the toCreateOptions() pattern by adding a conditional that
when w.Project is non-empty sets opts.Project = &tfe.Project{ID: w.Project}
(same guard and assignment used in toCreateOptions()), ensuring to reference the
WorkspaceUpdateOptions instance named opts in toUpdateOptions().
- Around line 158-164: Remove the hardcoded auto-apply override: delete the
autoApply variable and do not set the AutoApply field when constructing
tfe.RunCreateOptions in the client.Runs.Create call (leave AutoApply omitted so
the go-tfe SDK will use the workspace's configured behavior); update the
RunCreateOptions invocation in workspace.go where run is created (the call to
client.Runs.Create with Workspace: &tfe.Workspace{ID: workspaceID}, Message:
&message) to exclude AutoApply entirely.
- Around line 188-228: The update path currently only sets URL and Token on
tfe.NotificationConfigurationUpdateOptions, so it can leave Enabled and Triggers
out-of-sync; update the logic around NotificationConfigurationUpdateOptions
(used in the client.NotificationConfigurations.Update call) to also set Enabled
(true) and the full Triggers slice (the same values used in the create opts) and
ensure Token is conditionally set when webhookSecret != ""; use the same
webhookURL, webhookSecret, enabled flag and triggers as in the create branch so
the notification config is fully reconciled to the desired state.

---

Nitpick comments:
In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`:
- Line 52: Remove the redundant comment that merely repeats the function name
for templateWorkspace; either delete it or replace it with a concise comment
describing the non-obvious contract of templateWorkspace (e.g., the expected
shape of dispatch.Context it uses, which template variables must be present,
error behavior, and any side-effects such as escaping or trimming of values) so
readers understand preconditions/outputs rather than restating the name.
- Around line 15-50: Add Go doc comments above each exported type
(VCSRepoTemplate, WorkspaceTemplate, VariableTemplate) describing their role in
the Terraform Cloud workspace template model and how they are used (e.g.,
VCSRepoTemplate represents VCS connection details for a workspace;
WorkspaceTemplate defines workspace-level settings such as execution mode,
triggers, VCSRepo and Variables; VariableTemplate describes a workspace variable
including category/HCL/sensitive flags). Ensure each comment starts with the
type name (e.g., "VCSRepoTemplate ..." ) and briefly mention important fields or
semantics so godoc users understand intent and usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b6737286-a88d-468a-af9c-48acf13a6469

📥 Commits

Reviewing files that changed from the base of the PR and between 75fa9b9 and 80ee5fe.

📒 Files selected for processing (2)
  • apps/api/src/routes/tfe/run_notification.ts
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/api/src/routes/tfe/run_notification.ts

zacharyblasczyk and others added 3 commits March 20, 2026 11:41
Align runtime validation with the OpenAPI schema — webhookUrl is
required for status updates to flow back via webhooks.
… project to update

- Remove hardcoded autoApply=true so runs defer to the workspace setting
- Reconcile enabled/triggers on notification config update, not just URL/token
- Add missing Project field to toUpdateOptions
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go (1)

74-83: Consider logging when webhook secret is not configured.

When TFE_WEBHOOK_SECRET is empty, the notification config is created without HMAC authentication, meaning incoming webhooks cannot be verified. While the code handles this gracefully (skipping token setup), it might be worth logging a warning so operators are aware their webhook endpoint is unprotected.

💡 Optional: Add a warning when secret is missing
 	webhookSecret := os.Getenv("TFE_WEBHOOK_SECRET")
+	if webhookSecret == "" {
+		log.Warn("TFE_WEBHOOK_SECRET not set, webhook authentication disabled")
+	}
 	if err := ensureNotificationConfig(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go`
around lines 74 - 83, The code reads TFE_WEBHOOK_SECRET into webhookSecret and
calls ensureNotificationConfig without warning when it's empty; add a
conditional check after reading webhookSecret (before calling
ensureNotificationConfig) to detect an empty value and emit a warning via the
existing logger (e.g., log.Warn) that the webhook URL (cfg.webhookUrl) will be
created without HMAC authentication, referencing TFE_WEBHOOK_SECRET and
suggesting operators set it; then proceed to call ensureNotificationConfig as
before.
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go (1)

193-200: Consider extracting duplicate trigger list.

The same trigger list is defined in both the update (lines 193-200) and create (lines 224-231) paths. Extracting it to a package-level variable would reduce duplication and ensure they stay in sync.

♻️ Optional: Extract common trigger list
+var notificationTriggers = []tfe.NotificationTriggerType{
+	tfe.NotificationTriggerCreated,
+	tfe.NotificationTriggerPlanning,
+	tfe.NotificationTriggerNeedsAttention,
+	tfe.NotificationTriggerApplying,
+	tfe.NotificationTriggerCompleted,
+	tfe.NotificationTriggerErrored,
+}
+
 func ensureNotificationConfig(
 	ctx context.Context,
 	client *tfe.Client,
 	workspaceID, webhookURL, webhookSecret string,
 ) error {
 	// ... list configs ...
 
 		updateOpts := tfe.NotificationConfigurationUpdateOptions{
 			URL:      &webhookURL,
 			Enabled:  &enabled,
-			Triggers: []tfe.NotificationTriggerType{
-				tfe.NotificationTriggerCreated,
-				tfe.NotificationTriggerPlanning,
-				tfe.NotificationTriggerNeedsAttention,
-				tfe.NotificationTriggerApplying,
-				tfe.NotificationTriggerCompleted,
-				tfe.NotificationTriggerErrored,
-			},
+			Triggers: notificationTriggers,
 		}
 	// ... and similarly for create opts ...

Also applies to: 224-231

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`
around lines 193 - 200, Extract the duplicated slice of
tfe.NotificationTriggerType values into a single package-level variable (e.g.,
workspaceNotificationTriggers) and replace the inline slices in both the update
and create code paths with that variable; locate the two inline slices that
contain tfe.NotificationTriggerCreated, tfe.NotificationTriggerPlanning,
tfe.NotificationTriggerNeedsAttention, tfe.NotificationTriggerApplying,
tfe.NotificationTriggerCompleted, tfe.NotificationTriggerErrored and assign them
to the new package-level var so both the update and create paths reference the
same slice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go`:
- Around line 74-83: The code reads TFE_WEBHOOK_SECRET into webhookSecret and
calls ensureNotificationConfig without warning when it's empty; add a
conditional check after reading webhookSecret (before calling
ensureNotificationConfig) to detect an empty value and emit a warning via the
existing logger (e.g., log.Warn) that the webhook URL (cfg.webhookUrl) will be
created without HMAC authentication, referencing TFE_WEBHOOK_SECRET and
suggesting operators set it; then proceed to call ensureNotificationConfig as
before.

In
`@apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go`:
- Around line 193-200: Extract the duplicated slice of
tfe.NotificationTriggerType values into a single package-level variable (e.g.,
workspaceNotificationTriggers) and replace the inline slices in both the update
and create code paths with that variable; locate the two inline slices that
contain tfe.NotificationTriggerCreated, tfe.NotificationTriggerPlanning,
tfe.NotificationTriggerNeedsAttention, tfe.NotificationTriggerApplying,
tfe.NotificationTriggerCompleted, tfe.NotificationTriggerErrored and assign them
to the new package-level var so both the update and create paths reference the
same slice.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 30886f6a-17a8-40a0-8ac5-25fec11a5111

📥 Commits

Reviewing files that changed from the base of the PR and between 80ee5fe and b6fdc9b.

📒 Files selected for processing (4)
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/config.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe_test.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/workspace.go
✅ Files skipped from review due to trivial changes (2)
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/config.go
  • apps/workspace-engine/svc/controllers/jobdispatch/jobagents/terraformcloud/tfe_test.go

@zacharyblasczyk zacharyblasczyk merged commit 4e1319e into main Mar 20, 2026
7 of 10 checks passed
@zacharyblasczyk zacharyblasczyk deleted the tfc-workspace-job-agent-v2 branch March 20, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants