Skip to content

Draft: Fix runAsUser error, migration ordering, and details during session errors#1592

Draft
bsquizz wants to merge 2 commits into
ambient-code:mainfrom
bsquizz:fixes
Draft

Draft: Fix runAsUser error, migration ordering, and details during session errors#1592
bsquizz wants to merge 2 commits into
ambient-code:mainfrom
bsquizz:fixes

Conversation

@bsquizz
Copy link
Copy Markdown

@bsquizz bsquizz commented May 18, 2026

Summary

  • Fix runAsNonRoot compliance in ambient-api-server: Added shadow-utils and a dedicated appuser (UID 1000) to the Dockerfile so the image satisfies runAsNonRoot without depending solely on the manifest override. Added runAsUser: 1000 to the pod securityContext to make the UID explicit.
  • Fix migration ID ordering in roleBindings: Corrected typedFKMigration ID from 202505130001 to 202605150001 so it sorts after the 202603100138 migration that creates the role_bindings table. The wrong ID caused fresh-cluster deployments to fail with "relation role_bindings does not exist".
  • Surface auth errors from K8s ListSessions: Instead of masking all K8s errors as 500, the handler now returns 401 for expired/invalid tokens and 403 for insufficient permissions. Applies to both the initial list call and the paginated continue call.

Test plan

  • Deploy to a fresh cluster and verify ambient-api-server starts without migration errors
  • Confirm kubectl describe pod shows the container running as UID 1000 with runAsNonRoot: true
  • Simulate an expired token against ListSessions and verify the response is 401, not 500
  • Simulate a token with no project access and verify the response is 403, not 500

🤖 Generated with Claude Code

bsquizz and others added 2 commits May 18, 2026 14:31
…ering

Add runAsUser: 1000 to pod securityContext and bake a non-root appuser
into the Dockerfile so the image satisfies the runAsNonRoot requirement
without relying solely on the manifest override.

Fix typedFKMigration ID from 202505130001 to 202605150001 so it sorts
after the 202603100138 migration that creates the role_bindings table;
the old ID caused fresh-cluster deployments to fail with
"relation role_bindings does not exist".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Return 401/403 to the caller when the K8s dynamic client gets an
Unauthorized or Forbidden error during session listing, rather than
masking both as 500. Applies to both the initial List call and the
paginated continue call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 18, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit aad7bab
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a0b5b4b1d4c9f00072ba0b3

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

The PR hardens the ambient-api-server container to run as non-root user UID 1000 across both the Docker image and Kubernetes deployment manifest. It also improves error response granularity in the session handler by distinguishing between Kubernetes API authorization failures (401/403) and generic failures (500). A migration ID is bumped in roleBindings.

Changes

Non-root user and session error handling

Layer / File(s) Summary
Non-root user setup in Docker and manifest
components/ambient-api-server/Dockerfile, components/manifests/base/core/ambient-api-server-service.yml
Dockerfile installs shadow-utils, creates appuser with UID 1000, and runs container as USER 1000. Kubernetes manifest sets securityContext.runAsUser to 1000 to enforce the same user at runtime.
Session listing authorization error handling
components/backend/handlers/sessions.go
ListSessions distinguishes Kubernetes API errors: IsUnauthorized returns 401 with "Token expired or invalid", IsForbidden returns 403 with "Unauthorized to access project", and others return 500. Applied to both initial list and paginated continue requests.
🚥 Pre-merge checks | ✅ 5 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Kubernetes Resource Safety ⚠️ Warning PostgreSQL container in ambient-api-server-db.yml missing resource limits/requests. Secrets and PVC lack OwnerReferences. Wildcard RBAC verbs exist in ClusterRole. Add resources.limits and resources.requests to postgres container. Add ownerReferences to Secrets/PVC pointing to parent Deployment. Restrict RBAC wildcards to specific verbs.
Title check ⚠️ Warning Title uses imperative form instead of Conventional Commits format; lacks required type(scope) prefix and does not follow the specified format. Reformat to Conventional Commits: 'fix(ambient-api-server): set runAsUser, update migration ID, and improve session error details'
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Performance And Algorithmic Complexity ✅ Passed No meaningful performance regressions. All changes are security/config fixes or proper pagination enhancements with bounded memory and O(n log n) sorting.
Security And Secret Handling ✅ Passed No hardcoded secrets. All API endpoints enforce auth via GetK8sClientsForRequest. No SQL injection. No sensitive data leaks. Non-root container UID 1000. Proper secret handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bsquizz bsquizz marked this pull request as draft May 18, 2026 18:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/manifests/base/core/ambient-api-server-service.yml (1)

59-59: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Set readOnlyRootFilesystem: true in both container security contexts.

Both the init container (line 59) and main container (line 160) have readOnlyRootFilesystem: false, which violates the coding guideline. If the containers need writable filesystem areas (e.g., /tmp), add emptyDir volume mounts for those specific paths.

As per coding guidelines, "All containers must have restricted SecurityContext: runAsNonRoot, drop ALL capabilities, readOnlyRootFilesystem".

🔒 Suggested security context updates
         securityContext:
           allowPrivilegeEscalation: false
-          readOnlyRootFilesystem: false
+          readOnlyRootFilesystem: true
           capabilities:
             drop:
               - ALL

If writable paths are needed, add volume mounts:

       initContainers:
         - name: migration
           # ... existing config ...
           volumeMounts:
             - name: db-secrets
               mountPath: /secrets/db
+            - name: tmp
+              mountPath: /tmp
       volumes:
         - name: db-secrets
           secret:
             secretName: ambient-api-server-db
+        - name: tmp
+          emptyDir: {}

Apply the same pattern to the main api-server container.

Also applies to: 160-160

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/manifests/base/core/ambient-api-server-service.yml` at line 59,
Update the containers' securityContext: set readOnlyRootFilesystem: true for the
initContainer and the main api-server container (the securityContext blocks
currently setting readOnlyRootFilesystem: false), and if either container needs
writable paths (e.g., /tmp) add an emptyDir volume and mount it at that path;
also ensure the securityContext includes runAsNonRoot: true and capabilities:
drop: ["ALL"] to match the guideline.
🧹 Nitpick comments (1)
components/ambient-api-server/plugins/roleBindings/migration.go (1)

77-88: ⚡ Quick win

Collect or log rollback errors instead of discarding.

The rollback silently discards all errors with _ = tx.Exec(...).Error. This makes diagnosing partial rollback failures difficult.

♻️ Proposed fix to collect errors
 	Rollback: func(tx *gorm.DB) error {
-		_ = tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_credential_id`).Error
-		_ = tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_session_id`).Error
-		_ = tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_agent_id`).Error
-		_ = tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_project_id`).Error
-		_ = tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS credential_id`).Error
-		_ = tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS session_id`).Error
-		_ = tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS agent_id`).Error
-		_ = tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS project_id`).Error
-		_ = tx.Exec(`ALTER TABLE role_bindings ALTER COLUMN user_id SET NOT NULL`).Error
-		return tx.Exec(`ALTER TABLE role_bindings ADD COLUMN IF NOT EXISTS scope_id TEXT`).Error
+		var errs []error
+		if err := tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_credential_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_session_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_agent_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_project_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS credential_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS session_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS agent_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`ALTER TABLE role_bindings DROP COLUMN IF EXISTS project_id`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`ALTER TABLE role_bindings ALTER COLUMN user_id SET NOT NULL`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if err := tx.Exec(`ALTER TABLE role_bindings ADD COLUMN IF NOT EXISTS scope_id TEXT`).Error; err != nil {
+			errs = append(errs, err)
+		}
+		if len(errs) > 0 {
+			return fmt.Errorf("rollback encountered %d error(s): %v", len(errs), errs)
+		}
+		return nil
 	}

As per coding guidelines: Never silently swallow partial failures; every error path must propagate or be explicitly collected, never discarded.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/roleBindings/migration.go` around lines
77 - 88, The Rollback function currently discards all tx.Exec(...).Error
results; change it to collect each error from the calls inside Rollback (the
tx.Exec calls that DROP INDEX, DROP COLUMN, ALTER COLUMN, and ADD COLUMN) into
an error accumulator (e.g., a slice or use errors.Join) and at the end return a
single non-nil error if any step failed; ensure the final return does not
overwrite earlier errors but is combined with them so partial rollback failures
are not silently swallowed (update Rollback in migration.go where Rollback:
func(tx *gorm.DB) error { ... } is defined).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/ambient-api-server/plugins/roleBindings/migration.go`:
- Line 35: The Rollback function for the migration with ID "202605150001"
currently discards errors from the tx.Exec calls (the `_` assignments in
Rollback), so update Rollback(func(tx *gorm.DB)) to collect each error into a
slice (e.g., var errs []error) by appending err when a tx.Exec(...).Error != nil
for every DROP/DDL operation, and at the end return a combined error (e.g.,
fmt.Errorf("rollback errors: %v", errs)) if len(errs)>0, otherwise return nil;
ensure every tx.Exec in Rollback is changed to check and append its error
instead of ignoring it.

---

Outside diff comments:
In `@components/manifests/base/core/ambient-api-server-service.yml`:
- Line 59: Update the containers' securityContext: set readOnlyRootFilesystem:
true for the initContainer and the main api-server container (the
securityContext blocks currently setting readOnlyRootFilesystem: false), and if
either container needs writable paths (e.g., /tmp) add an emptyDir volume and
mount it at that path; also ensure the securityContext includes runAsNonRoot:
true and capabilities: drop: ["ALL"] to match the guideline.

---

Nitpick comments:
In `@components/ambient-api-server/plugins/roleBindings/migration.go`:
- Around line 77-88: The Rollback function currently discards all
tx.Exec(...).Error results; change it to collect each error from the calls
inside Rollback (the tx.Exec calls that DROP INDEX, DROP COLUMN, ALTER COLUMN,
and ADD COLUMN) into an error accumulator (e.g., a slice or use errors.Join) and
at the end return a single non-nil error if any step failed; ensure the final
return does not overwrite earlier errors but is combined with them so partial
rollback failures are not silently swallowed (update Rollback in migration.go
where Rollback: func(tx *gorm.DB) error { ... } is defined).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b6dea970-049f-4784-949f-0af55fe6a235

📥 Commits

Reviewing files that changed from the base of the PR and between 63545c7 and aad7bab.

📒 Files selected for processing (4)
  • components/ambient-api-server/Dockerfile
  • components/ambient-api-server/plugins/roleBindings/migration.go
  • components/backend/handlers/sessions.go
  • components/manifests/base/core/ambient-api-server-service.yml

func typedFKMigration() *gormigrate.Migration {
return &gormigrate.Migration{
ID: "202505130001",
ID: "202605150001",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the old migration ID exists in version control history or deployed environments

# Search for the old migration ID in git history
echo "=== Checking git history for old migration ID ==="
git log --all --oneline --grep="202505130001" || echo "No commits mention old ID"

# Search for migration tracking in the codebase
echo -e "\n=== Checking for migration version tracking ==="
rg -n "202505130001" --type=go

# Look for any migration state or version files
echo -e "\n=== Looking for migration state files ==="
fd -e json -e yaml -e yml migration | head -20

Repository: ambient-code/platform

Length of output: 299


🏁 Script executed:

cat -n components/ambient-api-server/plugins/roleBindings/migration.go

Repository: ambient-code/platform

Length of output: 4195


🏁 Script executed:

git log --oneline --all -- components/ambient-api-server/plugins/roleBindings/migration.go | head -20

Repository: ambient-code/platform

Length of output: 328


🏁 Script executed:

git show 5556220:components/ambient-api-server/plugins/roleBindings/migration.go | head -50

Repository: ambient-code/platform

Length of output: 1630


Fix Rollback function to propagate all errors.

The migration ID change from 202505130001 to 202605150001 is confirmed and will trigger a rerun if already deployed. While the Migrate function is idempotent (all DDL uses IF EXISTS/IF NOT EXISTS guards), the Rollback function silently discards all errors except the last one (lines 78–86). Per guidelines, errors must be propagated or explicitly collected, never discarded. If any operation fails during rollback, the failure is invisible and leaves the schema in an inconsistent state.

Collect all rollback errors instead of silencing them with _ assignment:

Suggested fix
Rollback: func(tx *gorm.DB) error {
	var errs []error
	if err := tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_credential_id`).Error; err != nil {
		errs = append(errs, err)
	}
	if err := tx.Exec(`DROP INDEX IF EXISTS idx_role_bindings_session_id`).Error; err != nil {
		errs = append(errs, err)
	}
	// ... continue for all operations
	if len(errs) > 0 {
		return fmt.Errorf("rollback errors: %v", errs)
	}
	return nil
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/ambient-api-server/plugins/roleBindings/migration.go` at line 35,
The Rollback function for the migration with ID "202605150001" currently
discards errors from the tx.Exec calls (the `_` assignments in Rollback), so
update Rollback(func(tx *gorm.DB)) to collect each error into a slice (e.g., var
errs []error) by appending err when a tx.Exec(...).Error != nil for every
DROP/DDL operation, and at the end return a combined error (e.g.,
fmt.Errorf("rollback errors: %v", errs)) if len(errs)>0, otherwise return nil;
ensure every tx.Exec in Rollback is changed to check and append its error
instead of ignoring it.

@bsquizz bsquizz changed the title Draft: Fix runAsUser error and surface more details during session errors Draft: Fix runAsUser error, migration ordering, and details during session errors May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant