Skip to content

re-arch: Add go webserver with Jobs manager, improve GitHub app token minting#33

Open
swghosh wants to merge 5 commits intomainfrom
rearch
Open

re-arch: Add go webserver with Jobs manager, improve GitHub app token minting#33
swghosh wants to merge 5 commits intomainfrom
rearch

Conversation

@swghosh
Copy link
Copy Markdown
Collaborator

@swghosh swghosh commented Mar 31, 2026

  • Add a go web server for k8s Job creation/polling, serves web UI
  • The GitHub bot token expires every 1h, a token minter with bot private key can generate short-lived JWT tokens
  • A Kubernetes Job with Python script runs the actual agent workflow

Summary by CodeRabbit

  • New Features

    • Browser UI to start workflows, pick repos, and stream live status/logs via SSE.
    • HTTP APIs to list repos, submit jobs, query status, and stream job output.
    • Server now loads repo allowlist/config and exposes job submit/status lifecycle.
  • Infrastructure & Services

    • Kustomize-based Kubernetes manifests: deployments, services, RBAC, routes, and secrets/configmaps.
    • Automated GitHub App token minter with secret rotation and accompanying container images for server and jobs.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a v2 deployment: a Go HTTP server (submit/status/SSE logs) that creates Kubernetes Jobs, a Python job worker entrypoint, a GitHub App token minter (script, image, deployment, RBAC), Dockerfiles, Kustomize manifests, and related config/secret manifests and static UI.

Changes

Cohort / File(s) Summary
Worker Entrypoint & Helpers
server/main.py, server/gh-credential-helper-v2.sh, server/requirements.txt
New async Python CLI entrypoint for running workflows with JSON-line progress and exit codes; Git credential helper reads mounted GitHub token; added rich to Python requirements.
Go Server v2 Core
go-server-v2/main.go, go-server-v2/handlers.go, go-server-v2/config.go, go-server-v2/k8s.go, go-server-v2/go.mod
New Go service: config loader (CSV parsing, namespace autodiscovery), K8s client wrapper to create/observe Jobs, job status/pod/log helpers, HTTP handlers for home/repos/submit/status/stream, and module/deps.
Go Server UI & Static
go-server-v2/static/homepage.html
New static HTML UI to list repos, submit workflows, and stream status/logs via SSE.
GitHub Token Minter
gh-token-minter/mint-token.sh, images-v2/gh-token-minter.Dockerfile, deploy-v2/gh-token-minter-deployment.yaml
Bash script (embedded Python) to mint GitHub App installation tokens and update a Kubernetes Secret; image and Deployment with ServiceAccount and RBAC for periodic refresh.
Job Container & Build Images
images-v2/job.Dockerfile, images-v2/go-server.Dockerfile
New Dockerfiles: job image bundling Python worker, credential helper and config; multi-stage Go server build producing runtime image.
Kustomize & Deployment Manifests
deploy-v2/kustomization.yaml, deploy-v2/go-server-deployment.yaml, deploy-v2/secret-configs.yaml, deploy-v2/gh-token-minter-deployment.yaml
Kustomize root for shift-ai-v2; go-server Deployment/Service/Route, RBAC and volumes, token-minter Deployment/Role/RoleBinding/ServiceAccount, and secrets/configmaps for worker config and GitHub App key.
Configs & Static Data
deploy-v2/config/config.json, deploy-v2/config/team-repos.csv
Added config JSON and team repos CSV consumed by the Go server config loader.
Misc: Job Entrypoint Files
server/main.py, server/agent.py*
Adds standalone job entrypoint (server/main.py) that runs run_workflow and prints JSON events and final status/PRs; ties into existing agent logic (see referenced agent changes).

Sequence Diagram(s)

sequenceDiagram
    participant Browser as Browser (User)
    participant GoServer as Go Server (HTTP)
    participant K8sAPI as Kubernetes API
    participant JobPod as Worker Pod (Python)
    participant GitHub as GitHub API

    Browser->>GoServer: POST /submit (ep_url, repo)
    GoServer->>GoServer: validate & generate job_id
    GoServer->>K8sAPI: Create Job (env, secrets, volumes)
    K8sAPI-->>GoServer: Job created
    GoServer-->>Browser: return job_id

    Browser->>GoServer: GET /stream/{job_id} (SSE)
    GoServer->>K8sAPI: poll for pod & phase
    K8sAPI-->>GoServer: pod found / running

    GoServer->>K8sAPI: stream pod logs (follow)
    K8sAPI-->>GoServer: log lines
    GoServer-->>Browser: SSE log/status events

    JobPod->>GitHub: use mounted token to perform workflow API calls
    JobPod->>JobPod: complete workflow work

    K8sAPI-->>GoServer: job final status
    GoServer-->>Browser: SSE complete (result)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically summarizes the primary changes: adding a Go webserver with Kubernetes Jobs management and improving GitHub app token minting.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rearch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (1)
deploy/deployment.yaml (1)

96-96: Avoid :latest in the Deployment manifests.

CI is already producing sha tags, but these manifests still deploy :latest. That makes rollouts non-reproducible and allows a reschedule or later apply to pick up a different build than the one this PR was validated against. Use the immutable sha tag or image digest here.

Also applies to: 148-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/deployment.yaml` at line 96, Replace the hardcoded image:
ghcr.io/shiftweek/oape-ai-e2e-token-service:latest in the Deployment manifest
with the immutable CI-produced SHA tag or the image digest (e.g.,
ghcr.io/shiftweek/oape-ai-e2e-token-service@sha256:...) so rollouts are
reproducible; update every occurrence of the image field that currently uses
:latest (including the other instance referenced) and, if applicable, wire it to
the CI variable or kustomize/Helm value that injects the sha-tag so the manifest
always deploys the exact build validated by CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy/deployment.yaml`:
- Around line 93-115: Add explicit pod- and container-level securityContext
blocks for the new token-service deployment and the oape-orchestrator deployment
to harden pods: at the pod spec level set securityContext with runAsNonRoot:
true, runAsUser: 1000 (or a non-root UID), fsGroup: 65534, and seccompProfile: {
type: RuntimeDefault }; within the token-service container (name: token-service)
and the oape-orchestrator container set securityContext with
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, privileged:
false, capabilities: { drop: ["ALL"] }, and runAsNonRoot: true; ensure volume
mounts remain writable only where needed and that the secret volume mount
retains readOnly: true.

In `@go-server/config.go`:
- Around line 46-53: The config loading code that parses TTL_AFTER_FINISHED
should reject negative values immediately: after parsing v into n in the TTL
parsing block, check if n < 0 and return an error (e.g., fmt.Errorf("invalid
TTL_AFTER_FINISHED: value must be non-negative")) instead of allowing a negative
ttl to be assigned to ttl (int32); keep the existing strconv.ParseInt call and
the ttl = int32(n) assignment but add the non-negative validation before the
conversion/assignment to fail-fast on bad configuration.

In `@go-server/go.mod`:
- Line 35: Update the vulnerable indirect dependency golang.org/x/oauth2 from
v0.21.0 to v0.27.0+ by adding/updating its entry in go.mod (the existing
"golang.org/x/oauth2" require line) or running "go get
golang.org/x/oauth2@v0.27.0" and then "go mod tidy" to regenerate go.sum; ensure
the updated version appears in go.mod/go.sum and re-run tests/build to confirm
no breakage (target symbol: golang.org/x/oauth2 in go.mod).

In `@go-server/handlers.go`:
- Around line 156-159: The handlers currently treat all errors from
a.k8s.GetJobStatus, a.k8s.GetJobPod and the pod Get call as "not found" or
transient; update these to inspect Kubernetes error types (use
k8s.io/apimachinery/pkg/api/errors functions like apierrors.IsNotFound,
apierrors.IsForbidden, apierrors.IsUnauthorized, etc.) and branch accordingly:
return a 404 via writeError only when apierrors.IsNotFound(err) is true, return
a 4xx (e.g., 403/401) when IsForbidden/IsUnauthorized match, and return a 5xx
(and stop retrying/waiting) for API server/network errors; ensure GetJobStatus,
GetJobPod and the pod Get error handling uses these checks and surfaces
non-transient errors immediately instead of mapping them to "waiting".
- Around line 67-86: The fetchToken method currently calls http.Get without a
context or timeout; change fetchToken to accept a context (e.g., func (a *App)
fetchToken(ctx context.Context) (string, error)) and create the request with
http.NewRequestWithContext(ctx, "GET", a.cfg.TokenServiceURL+"/token", nil) or
use a custom http.Client with a timeout (e.g., &http.Client{Timeout: X}) to
perform the request, propagate context from the caller (r.Context()) into the
call, and keep the existing response handling and error wrapping in fetchToken.
- Around line 264-289: The scanner loop using bufio.NewScanner currently treats
EOF and read errors the same and immediately emits a "complete" event; after the
loop returns, check scanner.Err() and if non-nil emit an "error" or
"stream_error" SSE event (use w and flusher) describing the streaming failure
instead of querying a.k8s.GetJobStatus/jobID for completion, and only fall back
to calling a.k8s.GetJobStatus and sending the "complete" event when
scanner.Err() is nil; ensure you still flush via flusher after sending either
the error or complete event.

In `@go-server/k8s.go`:
- Around line 78-139: The PodSpec currently defined in the corev1.PodSpec
literal (the block with RestartPolicy, Containers, Volumes) should explicitly
disable automatic mounting of the service-account token by adding
automountServiceAccountToken: false to that PodSpec; update the PodSpec
construction (the corev1.PodSpec where RestartPolicy: corev1.RestartPolicyNever
and Containers: []corev1.Container{...}) to include
AutomountServiceAccountToken: pointer to false (i.e., &false) so worker pods do
not receive the namespace service-account token.
- Around line 85-95: The GH_TOKEN is being inlined in the Env slice
(corev1.EnvVar with Name "GH_TOKEN" and Value params.GHToken); instead create a
Kubernetes Secret holding the token and change the EnvVar for "GH_TOKEN" to use
ValueFrom.SecretKeyRef referencing that Secret and its key instead of Value;
update the Job creation code that builds the Env: []corev1.EnvVar to remove
Value usage for GH_TOKEN and use ValueFrom.SecretKeyRef (pointing to the new
secret name and key), and ensure code that provisions the Job also creates or
expects that Secret to exist.

In `@go-server/main.go`:
- Around line 31-32: Replace the direct call to
http.ListenAndServe(cfg.ListenAddr, mux) with an explicit http.Server instance
(set Addr to cfg.ListenAddr and Handler to mux) and configure server timeouts:
set ReadHeaderTimeout and ReadTimeout to reasonable values and set IdleTimeout
to limit idle connections; do not set WriteTimeout so SSE handler
/stream/{job_id} can stream uninterrupted; then call server.ListenAndServe() and
log.Fatal on its error (keeping the existing log.Printf about cfg.ListenAddr).
Use the existing symbols cfg.ListenAddr and mux to locate where to create the
server.

In `@go-server/static/homepage.html`:
- Around line 89-114: The page currently logs /repos failures only to console;
update the loadRepos() flow to write an error message into the DOM element with
id "status" and keep the form submit control disabled until repos load
successfully: in loadRepos(), locate the repoSelect and submit button elements
(e.g., repoSelect and the submit button control used elsewhere), set the submit
button disabled at the start, and on successful fetch populate repoSelect and
enable the submit button; in the catch block set
document.getElementById('status').textContent to a user-friendly error message
(including minimal detail like "Failed to load repositories") and ensure the
submit button remains disabled so users cannot submit until repos load. Ensure
loadRepos() is the function modified and only manipulate `#status`, repoSelect,
and the submit control.
- Around line 79-80: The status and logOutput divs are not announced by
assistive tech; change the elements with id="status" and id="logOutput" to be
ARIA live regions (e.g., add attributes like role="status" or role="log",
aria-live="polite" or "assertive" as appropriate, and aria-atomic="true") so
streamed progress and final results are announced; ensure updates replace or
append plain text (not only child nodes that screen readers ignore) so screen
readers receive each update.

In `@server/main.py`:
- Around line 27-40: Replace the plain string prints with JSON-line events so
the orchestrator always receives structured messages: emit a JSON object for the
start event (e.g.,
{"event":"workflow_start","ep_url":ep_url,"repo":repo,"cwd":working_dir}), for
success emit {"event":"workflow_success","cost_usd":result.cost_usd} and for
each PR emit {"event":"pr_created","pr_url":pr.pr_url}, and for failure emit
{"event":"workflow_failed","error":str(result.error)}; use json.dumps(...,
default=str) and print(..., flush=True) for stdout events and print(...,
file=sys.stderr, flush=True) for failure so on_message and run_workflow remain
unchanged but all lifecycle prints are uniform JSON lines.
- Around line 24-25: The code currently defaults WORKING_DIR to "/workspace" and
calls os.makedirs(working_dir, exist_ok=True) which fails for non-root image
users; change the default to a user-writable path such as "/tmp/workspace" (i.e.
update os.environ.get("WORKING_DIR", "/tmp/workspace")) and/or add a fallback:
attempt to create the directory and if a PermissionError is raised, fall back to
a writable location (e.g. tempfile.mkdtemp or "/tmp/workspace") and log the
switch; refer to the working_dir variable, the os.environ.get call, and the
os.makedirs(working_dir, exist_ok=True) line when making this change.

In `@token-service/github_app.go`:
- Around line 97-132: The code currently picks installations[0] and posts a nil
body to tokenURL (constructed from instID) which can select the wrong
installation or mint an overly-broad token; update the handler that calls this
logic (the function around installations, instID, tokenURL, and
accessTokenResponse) to accept either a specific installation ID or the target
repository/owner (owner/repo) as request parameters, filter the decoded
installations slice by the installation account login or repository list to find
the correct instID (or require the caller to provide instID), and when creating
the POST to tokenURL include a JSON body with the repositories or repository_ids
field to scope the token to only the required repo; ensure headers still include
Authorization/Accept and handle encoding errors and response decoding from
accessTokenResponse as before.

In `@token-service/go.mod`:
- Line 5: The project currently pins github.com/golang-jwt/jwt/v5 at v5.2.1
which is vulnerable; update the dependency reference for
"github.com/golang-jwt/jwt/v5" to v5.2.2 in the go.mod require statement, then
run the normal Go module maintenance commands (go get
github.com/golang-jwt/jwt/v5@v5.2.2 and/or go mod tidy) to update go.sum and
vendor as needed, and run tests/build to verify nothing breaks.

In `@token-service/main.go`:
- Line 48: The endpoint registration using mux.HandleFunc("GET /token",
handleToken(appID, privateKey)) exposes fresh installation tokens to
unauthenticated callers; change this to a non-idempotent verb and add
authentication/authorization and no-store caching: register as POST (e.g.,
mux.HandleFunc("POST /token", ...)), enforce an authz middleware or explicit
caller validation inside handleToken (validate incoming JWT/API key/client
identity before minting), and set response headers to Cache-Control: no-store
(and other security headers) before returning tokens; update related handlers
mentioned around lines 58-74 similarly.
- Around line 54-55: Replace the direct call to http.ListenAndServe (used after
log.Printf("Token service listening on %s", listenAddr)) with an instantiated
http.Server that sets explicit timeouts (e.g., ReadTimeout, WriteTimeout,
IdleTimeout) and assigns Addr = listenAddr and Handler = mux, then call
server.ListenAndServe() (and log.Fatal on error) so the token service is
protected against slow/long-lived connections.

---

Nitpick comments:
In `@deploy/deployment.yaml`:
- Line 96: Replace the hardcoded image:
ghcr.io/shiftweek/oape-ai-e2e-token-service:latest in the Deployment manifest
with the immutable CI-produced SHA tag or the image digest (e.g.,
ghcr.io/shiftweek/oape-ai-e2e-token-service@sha256:...) so rollouts are
reproducible; update every occurrence of the image field that currently uses
:latest (including the other instance referenced) and, if applicable, wire it to
the CI variable or kustomize/Helm value that injects the sha-tag so the manifest
always deploys the exact build validated by CI.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c08708e7-a820-4c89-9d46-cbd9f6d512d1

📥 Commits

Reviewing files that changed from the base of the PR and between 5fdf463 and 192ce44.

⛔ Files ignored due to path filters (2)
  • go-server/go.sum is excluded by !**/*.sum
  • token-service/go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • .github/workflows/build-image.yaml
  • Dockerfile
  • deploy/deployment.yaml
  • go-server/Dockerfile
  • go-server/config.go
  • go-server/go-server
  • go-server/go.mod
  • go-server/handlers.go
  • go-server/k8s.go
  • go-server/main.go
  • go-server/static/homepage.html
  • server/homepage.html
  • server/main.py
  • server/requirements.txt
  • server/server.py
  • token-service/Dockerfile
  • token-service/github_app.go
  • token-service/go.mod
  • token-service/main.go
  • token-service/token-service
💤 Files with no reviewable changes (3)
  • server/requirements.txt
  • server/homepage.html
  • server/server.py

Comment on lines +93 to +115
spec:
containers:
- name: oape-server
image: ghcr.io/shiftweek/oape-ai-e2e:latest
- name: token-service
image: ghcr.io/shiftweek/oape-ai-e2e-token-service:latest
ports:
- containerPort: 8000
- containerPort: 8081
env:
- name: GITHUB_APP_ID
valueFrom:
configMapKeyRef:
name: oape-orchestrator-config
key: GITHUB_APP_ID
- name: GITHUB_APP_PRIVATE_KEY_PATH
value: /secrets/github-app/private-key.pem
volumeMounts:
- name: github-app-key
mountPath: /secrets/github-app
readOnly: true
volumes:
- name: github-app-key
secret:
secretName: github-app-private-key
---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "deploy*" -o -name "deployment*" 2>/dev/null | head -20

Repository: shiftweek/oape-ai-e2e

Length of output: 47


🏁 Script executed:

git ls-files | grep -E "(deploy|deployment)" | head -20

Repository: shiftweek/oape-ai-e2e

Length of output: 113


🏁 Script executed:

wc -l deploy/deployment.yaml

Repository: shiftweek/oape-ai-e2e

Length of output: 91


🏁 Script executed:

cat -n deploy/deployment.yaml | head -170

Repository: shiftweek/oape-ai-e2e

Length of output: 5207


Harden the new service pods explicitly.

Both Deployments currently rely on platform defaults, leaving root execution, privilege escalation, and writable root filesystems available. These are HTTP services and should run with restricted security profiles.

Apply pod-level and container-level securityContext to both deployments:

🔒 Suggested hardening pattern
spec:
  template:
    spec:
+     securityContext:
+       runAsNonRoot: true
+       seccompProfile:
+         type: RuntimeDefault
      containers:
        - name: token-service
+         securityContext:
+           allowPrivilegeEscalation: false
+           readOnlyRootFilesystem: true
+           capabilities:
+             drop: ["ALL"]

Apply the same settings to the oape-orchestrator deployment at lines 144-166.

🧰 Tools
🪛 Checkov (3.2.510)

[medium] 78-115: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 78-115: Minimize the admission of root containers

(CKV_K8S_23)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/deployment.yaml` around lines 93 - 115, Add explicit pod- and
container-level securityContext blocks for the new token-service deployment and
the oape-orchestrator deployment to harden pods: at the pod spec level set
securityContext with runAsNonRoot: true, runAsUser: 1000 (or a non-root UID),
fsGroup: 65534, and seccompProfile: { type: RuntimeDefault }; within the
token-service container (name: token-service) and the oape-orchestrator
container set securityContext with allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, privileged: false, capabilities: { drop: ["ALL"]
}, and runAsNonRoot: true; ensure volume mounts remain writable only where
needed and that the secret volume mount retains readOnly: true.

Comment on lines +46 to +53
ttl := int32(3600)
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
ttl = int32(n)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate TTL_AFTER_FINISHED is non-negative.

Negative values parse successfully here and only fail later when the Job is created. Reject them during config loading so the service fails fast on bad configuration.

🛡️ Suggested validation
 	if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
 		n, err := strconv.ParseInt(v, 10, 32)
 		if err != nil {
 			return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
 		}
+		if n < 0 {
+			return nil, fmt.Errorf("TTL_AFTER_FINISHED must be >= 0")
+		}
 		ttl = int32(n)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ttl := int32(3600)
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
ttl = int32(n)
}
ttl := int32(3600)
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
if n < 0 {
return nil, fmt.Errorf("TTL_AFTER_FINISHED must be >= 0")
}
ttl = int32(n)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server/config.go` around lines 46 - 53, The config loading code that
parses TTL_AFTER_FINISHED should reject negative values immediately: after
parsing v into n in the TTL parsing block, check if n < 0 and return an error
(e.g., fmt.Errorf("invalid TTL_AFTER_FINISHED: value must be non-negative"))
instead of allowing a negative ttl to be assigned to ttl (int32); keep the
existing strconv.ParseInt call and the ttl = int32(n) assignment but add the
non-negative validation before the conversion/assignment to fail-fast on bad
configuration.

github.com/spf13/pflag v1.0.5 // indirect
github.com/x448/float16 v0.8.4 // indirect
golang.org/x/net v0.26.0 // indirect
golang.org/x/oauth2 v0.21.0 // indirect
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the go-server/go.mod file to confirm the current dependency state
find . -name "go.mod" -type f | head -20

Repository: shiftweek/oape-ai-e2e

Length of output: 106


🏁 Script executed:

# Check if go-server directory exists and examine its go.mod
if [ -f "go-server/go.mod" ]; then
  echo "=== go-server/go.mod content ==="
  cat go-server/go.mod
fi

Repository: shiftweek/oape-ai-e2e

Length of output: 2119


🌐 Web query:

GO-2025-3488 GHSA-6v2p-p543-phr9 golang.org/x/oauth2 vulnerability

💡 Result:

GO-2025-3488 (GHSA-6v2p-p543-phr9, CVE-2025-22868) is a vulnerability in golang.org/x/oauth2/jws causing unexpected memory consumption during parsing of malicious malformed tokens, enabling a denial-of-service attack. It affects versions before v0.27.0 of golang.org/x/oauth2. Fixed in v0.27.0. Published February 26, 2025. Credits to jub0bs. References include Go issue 71490 and CL 652155. Severity: High (CVSS 7.5, impact on availability).

Citations:


Upgrade golang.org/x/oauth2 to v0.27.0 or later.

The module graph includes golang.org/x/oauth2 v0.21.0 (via indirect dependency), which is vulnerable to GO-2025-3488 / GHSA-6v2p-p543-phr9. This DoS vulnerability in the jws package affects all versions before v0.27.0. Since k8s.io/client-go can use oauth2-backed kubeconfig authentication flows, upgrade to v0.27.0+ before release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server/go.mod` at line 35, Update the vulnerable indirect dependency
golang.org/x/oauth2 from v0.21.0 to v0.27.0+ by adding/updating its entry in
go.mod (the existing "golang.org/x/oauth2" require line) or running "go get
golang.org/x/oauth2@v0.27.0" and then "go mod tidy" to regenerate go.sum; ensure
the updated version appears in go.mod/go.sum and re-run tests/build to confirm
no breakage (target symbol: golang.org/x/oauth2 in go.mod).

Comment on lines +67 to +86
// fetchToken calls the Token Service to get an ephemeral GitHub token.
func (a *App) fetchToken() (string, error) {
resp, err := http.Get(a.cfg.TokenServiceURL + "/token")
if err != nil {
return "", fmt.Errorf("calling token service: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return "", fmt.Errorf("token service returned %d: %s", resp.StatusCode, body)
}

var result struct {
Token string `json:"token"`
}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return "", fmt.Errorf("decoding token response: %w", err)
}
return result.Token, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full handlers.go file to understand the context
cat -n go-server/handlers.go

Repository: shiftweek/oape-ai-e2e

Length of output: 9323


🌐 Web query:

Effective Go HTTP timeout best practices site:go.dev

💡 Result:

Effective Go HTTP timeout best practices focus on configuring timeouts for both clients and servers to prevent hanging connections, resource exhaustion, and slow responses. Always create custom http.Client and http.Server instances instead of using defaults, as DefaultClient has no timeout (zero value means infinite). Client timeouts (via http.Client.Timeout): Sets an overall limit including connect, redirects, and body read. Recommended: 5-30 seconds depending on service SLA. The timer continues after Do returns, canceling body reads. Use context.WithTimeout for per-request control. Transport timeouts (via http.Transport): - DialContext.Timeout (default 30s): Connection establishment. - TLSHandshakeTimeout (default 10s): TLS negotiation. - ResponseHeaderTimeout: Time after request write to receive headers. - ExpectContinueTimeout (default 1s): For Expect: 100-continue. - IdleConnTimeout (default 90s): Close idle connections. - Client.Timeout covers the full request if set. Example client: client := &http.Client{ Timeout: 30 * time.Second, Transport: &http.Transport{ DialContext: (&net.Dialer{Timeout: 30 * time.Second}).DialContext, TLSHandshakeTimeout: 10 * time.Second, ResponseHeaderTimeout: 30 * time.Second, ExpectContinueTimeout: 1 * time.Second, IdleConnTimeout: 90 * time.Second, }, } Server timeouts (via http.Server): - ReadTimeout (ex 10s): Max time to read request (headers + body). - WriteTimeout (ex 10s): Max time to write response. - ReadHeaderTimeout: Time to read headers only. Example server: server := &http.Server{ Addr: ":8080", ReadTimeout: 10 * time.Second, ReadHeaderTimeout: 5 * time.Second, WriteTimeout: 10 * time.Second, } Additional practices: - Reuse Clients/Transports for connection pooling; do not create per-request. - Use contexts (context.WithTimeout) for fine-grained cancellation, propagated to requests. - For handlers, wrap with http.TimeoutHandler. - Tune based on p99 latencies + buffer (e.g., ReadTimeout > p95 request time). - Avoid zero timeouts (infinite); always set explicit values. - Monitor for goroutine leaks from slow upstreams (issues like #31657 fixed in recent Go). These prevent DoS from slow clients/servers and ensure graceful degradation.

Citations:


Add context and timeout to the token-service call.

http.Get() uses the default client with no deadline. If the token service hangs or the network is slow, the /submit endpoint will block indefinitely. Pass r.Context() to fetchToken() and use http.NewRequestWithContext() or a custom client with an explicit timeout, as recommended by Effective Go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server/handlers.go` around lines 67 - 86, The fetchToken method currently
calls http.Get without a context or timeout; change fetchToken to accept a
context (e.g., func (a *App) fetchToken(ctx context.Context) (string, error))
and create the request with http.NewRequestWithContext(ctx, "GET",
a.cfg.TokenServiceURL+"/token", nil) or use a custom http.Client with a timeout
(e.g., &http.Client{Timeout: X}) to perform the request, propagate context from
the caller (r.Context()) into the call, and keep the existing response handling
and error wrapping in fetchToken.

Comment on lines +156 to +159
status, err := a.k8s.GetJobStatus(r.Context(), jobID)
if err != nil {
writeError(w, http.StatusNotFound, fmt.Sprintf("job not found: %s", jobID))
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd go-server && find . -name "handlers.go" | head -5

Repository: shiftweek/oape-ai-e2e

Length of output: 78


🏁 Script executed:

wc -l go-server/handlers.go

Repository: shiftweek/oape-ai-e2e

Length of output: 90


🏁 Script executed:

cat -n go-server/handlers.go | sed -n '150,165p'

Repository: shiftweek/oape-ai-e2e

Length of output: 551


🏁 Script executed:

cat -n go-server/handlers.go | sed -n '200,220p'

Repository: shiftweek/oape-ai-e2e

Length of output: 556


🏁 Script executed:

cat -n go-server/handlers.go | sed -n '225,240p'

Repository: shiftweek/oape-ai-e2e

Length of output: 598


🏁 Script executed:

# Search for GetJobStatus and GetJobPod implementations
rg "func.*GetJobStatus|func.*GetJobPod" go-server/

Repository: shiftweek/oape-ai-e2e

Length of output: 271


🏁 Script executed:

# Check for error type checking patterns (apierrors.IsNotFound, etc.)
rg "IsNotFound|apierrors|StatusReason" go-server/handlers.go

Repository: shiftweek/oape-ai-e2e

Length of output: 47


🏁 Script executed:

cat -n go-server/k8s.go | grep -A 20 "func (c \*K8sClient) GetJobStatus"

Repository: shiftweek/oape-ai-e2e

Length of output: 959


🏁 Script executed:

cat -n go-server/k8s.go | grep -A 20 "func (c \*K8sClient) GetJobPod"

Repository: shiftweek/oape-ai-e2e

Length of output: 974


🏁 Script executed:

# Check imports in handlers.go and k8s.go for Kubernetes error handling
rg "^import|apierrors" go-server/handlers.go go-server/k8s.go

Repository: shiftweek/oape-ai-e2e

Length of output: 121


🏁 Script executed:

# Check the full k8s.go file to understand structure
wc -l go-server/k8s.go && head -50 go-server/k8s.go

Repository: shiftweek/oape-ai-e2e

Length of output: 1410


Distinguish Kubernetes error types instead of collapsing all errors into "not found" or "waiting".

The code treats all GetJobStatus failures as 404 responses and all GetJobPod/Pod Get failures as temporary states. This masks real failures like RBAC denials, API server outages, and network errors. Use the error types from the Kubernetes client library (e.g., apierrors.IsNotFound()) to distinguish genuine "not found" cases from other errors, and surface non-transient failures immediately instead of retrying indefinitely.

Applies to lines 156–159, 203–217, and 228–235.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server/handlers.go` around lines 156 - 159, The handlers currently treat
all errors from a.k8s.GetJobStatus, a.k8s.GetJobPod and the pod Get call as "not
found" or transient; update these to inspect Kubernetes error types (use
k8s.io/apimachinery/pkg/api/errors functions like apierrors.IsNotFound,
apierrors.IsForbidden, apierrors.IsUnauthorized, etc.) and branch accordingly:
return a 404 via writeError only when apierrors.IsNotFound(err) is true, return
a 4xx (e.g., 403/401) when IsForbidden/IsUnauthorized match, and return a 5xx
(and stop retrying/waiting) for API server/network errors; ensure GetJobStatus,
GetJobPod and the pod Get error handling uses these checks and surfaces
non-transient errors immediately instead of mapping them to "waiting".

server/main.py Outdated
Comment on lines +27 to +40
print(f"Starting workflow: ep_url={ep_url} repo={repo} cwd={working_dir}", flush=True)

def on_message(msg):
print(json.dumps(msg, default=str), flush=True)

result = await run_workflow(ep_url, repo, working_dir, on_message=on_message)

if result.success:
print(f"WORKFLOW_SUCCESS cost=${result.cost_usd:.4f}", flush=True)
for pr in result.prs:
print(f"PR_CREATED: {pr.pr_url}", flush=True)
sys.exit(0)
else:
print(f"WORKFLOW_FAILED: {result.error}", file=sys.stderr, flush=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep the worker log stream uniformly structured.

Lines 27, 35, 37, and 40 emit plain strings even though this module’s contract says the orchestrator consumes JSON lines. That makes the stream format non-uniform and forces downstream consumers to special-case lifecycle events.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/main.py` around lines 27 - 40, Replace the plain string prints with
JSON-line events so the orchestrator always receives structured messages: emit a
JSON object for the start event (e.g.,
{"event":"workflow_start","ep_url":ep_url,"repo":repo,"cwd":working_dir}), for
success emit {"event":"workflow_success","cost_usd":result.cost_usd} and for
each PR emit {"event":"pr_created","pr_url":pr.pr_url}, and for failure emit
{"event":"workflow_failed","error":str(result.error)}; use json.dumps(...,
default=str) and print(..., flush=True) for stdout events and print(...,
file=sys.stderr, flush=True) for failure so on_message and run_workflow remain
unchanged but all lifecycle prints are uniform JSON lines.

Comment on lines +97 to +132
var installations []installation
if err := json.NewDecoder(resp.Body).Decode(&installations); err != nil {
return "", "", fmt.Errorf("decoding installations: %w", err)
}
if len(installations) == 0 {
return "", "", fmt.Errorf("no installations found for app %d", appID)
}

instID := installations[0].ID

// Create access token.
tokenURL := fmt.Sprintf("https://api.github.com/app/installations/%d/access_tokens", instID)
req, err = http.NewRequest("POST", tokenURL, nil)
if err != nil {
return "", "", err
}
req.Header.Set("Authorization", "Bearer "+jwtToken)
req.Header.Set("Accept", "application/vnd.github+json")

resp, err = client.Do(req)
if err != nil {
return "", "", fmt.Errorf("creating access token: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusCreated {
body, _ := io.ReadAll(resp.Body)
return "", "", fmt.Errorf("create access token returned %d: %s", resp.StatusCode, body)
}

var tokenResp accessTokenResponse
if err := json.NewDecoder(resp.Body).Decode(&tokenResp); err != nil {
return "", "", fmt.Errorf("decoding access token: %w", err)
}

return tokenResp.Token, tokenResp.ExpiresAt, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n token-service/github_app.go | head -150

Repository: shiftweek/oape-ai-e2e

Length of output: 4565


🏁 Script executed:

git ls-files token-service/ | head -20

Repository: shiftweek/oape-ai-e2e

Length of output: 209


🏁 Script executed:

rg "getInstallationToken" --type go -B 2 -A 2

Repository: shiftweek/oape-ai-e2e

Length of output: 822


🏁 Script executed:

cat -n token-service/main.go

Repository: shiftweek/oape-ai-e2e

Length of output: 2422


🌐 Web query:

GitHub App access token repository scoping API documentation

💡 Result:

GitHub Apps support repository scoping for their access tokens via the REST API endpoint POST /app/installations/{installation_id}/access_tokens. This generates an installation access token (IAT), which by default has access to all repositories the app installation can access. To scope it to specific repositories, include the 'repositories' (array of repository names) or 'repository_ids' (array of repository IDs) in the request body. Up to 500 repositories can be specified. The token cannot access repositories outside the installation's granted access. You must authenticate the request with a JWT generated for the GitHub App. The IAT expires in 1 hour. Example request: curl -X POST \ -H "Authorization: Bearer YOUR_JWT" \ -H "Accept: application/vnd.github+json" \ -H "X-GitHub-Api-Version: 2026-03-10" \ https://api.github.com/app/installations/{installation_id}/access_tokens \ -d '{ "repositories": ["repo1/repoA", "repo2/repoB"], "permissions": {"contents": "read"} }' User access tokens (generated via OAuth flow for GitHub Apps) inherit scoping from user and app permissions but do not use traditional scopes; they use fine-grained permissions and are limited to repositories both the user and app can access. They can also be further restricted with 'repository_id' parameter in the OAuth request. Official documentation is in GitHub Docs under "Generating an installation access token for a GitHub App" and REST API endpoints for Apps.

Citations:


Resolve the installation explicitly and scope the token.

installations[0] makes token selection depend on GitHub's installation ordering, and the nil body in the access-token request grants the worker access to all repositories the app installation can reach. GitHub's API returns installations in no guaranteed order, and once the app is installed in multiple places, this approach risks selecting the wrong installation or minting an overly broad token. The GitHub API supports repository scoping via the request body (using repositories or repository_ids fields)—this should be used to restrict tokens to the specific repository needed for each workflow job. Update the handler to accept the target repository/owner as request parameters, filter installations appropriately (or require the installation ID), and include repository scoping in the access-token request.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@token-service/github_app.go` around lines 97 - 132, The code currently picks
installations[0] and posts a nil body to tokenURL (constructed from instID)
which can select the wrong installation or mint an overly-broad token; update
the handler that calls this logic (the function around installations, instID,
tokenURL, and accessTokenResponse) to accept either a specific installation ID
or the target repository/owner (owner/repo) as request parameters, filter the
decoded installations slice by the installation account login or repository list
to find the correct instID (or require the caller to provide instID), and when
creating the POST to tokenURL include a JSON body with the repositories or
repository_ids field to scope the token to only the required repo; ensure
headers still include Authorization/Accept and handle encoding errors and
response decoding from accessTokenResponse as before.


go 1.23

require github.com/golang-jwt/jwt/v5 v5.2.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the go.mod file exists and examine its content
find . -name "go.mod" -path "*/token-service/*" -type f

Repository: shiftweek/oape-ai-e2e

Length of output: 87


🌐 Web query:

GO-2025-3553 GHSA-mh63-6h87-95cp github.com/golang-jwt/jwt vulnerability

💡 Result:

GO-2025-3553 (GHSA-mh63-6h87-95cp, CVE-2025-30204) is a high-severity vulnerability in github.com/golang-jwt/jwt causing excessive memory allocation during header parsing in Parser.ParseUnverified. A malicious Authorization header with many periods (e.g., "Bearer ........") triggers O(n) allocations via strings.Split on untrusted input (CWE-405). Affected versions: - github.com/golang-jwt/jwt/v4: before v4.5.2 - github.com/golang-jwt/jwt/v5: from v5.0.0-rc.1 before v5.2.2 - github.com/golang-jwt/jwt (v3 branch): >=3.2.0 <=3.2.2, no patch available Patched in: - v4.5.2 - v5.2.2 CVSS v3.1: 7.5 High (AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H) CVSS v4.0: 8.7 High Published March 2025. Fix commits: golang-jwt/jwt@0951d18 (v4), similar for v5. Mitigation: Upgrade to patched versions. Avoid ParseUnverified on untrusted input or limit header size.

Citations:


🏁 Script executed:

# Check the exact content of token-service/go.mod to confirm line 5
cat -n ./token-service/go.mod | head -10

Repository: shiftweek/oape-ai-e2e

Length of output: 207


Bump github.com/golang-jwt/jwt/v5 to v5.2.2 before release.

v5.2.1 is vulnerable to GO-2025-3553 (GHSA-mh63-6h87-95cp, CVE-2025-30204), a high-severity vulnerability (CVSS 7.5) causing excessive memory allocation during JWT header parsing on untrusted input. Patched in v5.2.2. This is a direct dependency and must not ship with a known-vulnerable version.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@token-service/go.mod` at line 5, The project currently pins
github.com/golang-jwt/jwt/v5 at v5.2.1 which is vulnerable; update the
dependency reference for "github.com/golang-jwt/jwt/v5" to v5.2.2 in the go.mod
require statement, then run the normal Go module maintenance commands (go get
github.com/golang-jwt/jwt/v5@v5.2.2 and/or go mod tidy) to update go.sum and
vendor as needed, and run tests/build to verify nothing breaks.

}

mux := http.NewServeMux()
mux.HandleFunc("GET /token", handleToken(appID, privateKey))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not mint GitHub credentials from a cacheable unauthenticated GET.

GET /token currently returns a fresh installation token to any caller that can reach this service, and the response is sent without Cache-Control: no-store. For a credential-minting endpoint, please add caller authz or strict network isolation at minimum; switching this to POST would also be safer.

Also applies to: 58-74

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@token-service/main.go` at line 48, The endpoint registration using
mux.HandleFunc("GET /token", handleToken(appID, privateKey)) exposes fresh
installation tokens to unauthenticated callers; change this to a non-idempotent
verb and add authentication/authorization and no-store caching: register as POST
(e.g., mux.HandleFunc("POST /token", ...)), enforce an authz middleware or
explicit caller validation inside handleToken (validate incoming JWT/API
key/client identity before minting), and set response headers to Cache-Control:
no-store (and other security headers) before returning tokens; update related
handlers mentioned around lines 58-74 similarly.

Comment on lines +54 to +55
log.Printf("Token service listening on %s", listenAddr)
log.Fatal(http.ListenAndServe(listenAddr, mux))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "main.go" -path "*/token-service/*" | head -5

Repository: shiftweek/oape-ai-e2e

Length of output: 88


🏁 Script executed:

cat -n token-service/main.go | head -70

Repository: shiftweek/oape-ai-e2e

Length of output: 2331


🏁 Script executed:

rg -i "github" token-service/main.go -A 2 -B 2

Repository: shiftweek/oape-ai-e2e

Length of output: 611


🏁 Script executed:

rg -i "private.?key\|privatekey" token-service/main.go -A 2 -B 2

Repository: shiftweek/oape-ai-e2e

Length of output: 47


🌐 Web query:

Effective Go server timeouts http.Server security best practices

💡 Result:

Effective Go http.Server timeouts and security best practices focus on preventing resource exhaustion from slow or malicious clients (e.g., Slowloris attacks). Always use an explicit http.Server instance instead of http.ListenAndServe, which has zero timeouts by default. Key http.Server timeout fields: - ReadTimeout: Max duration for reading the entire request (headers + body). Default 0 (no limit). Recommended: 5-10s. - ReadHeaderTimeout: Max time to read request headers. Default 0. Recommended: 2-5s. Use for slow header attacks. - WriteTimeout: Max duration before timing out response writes. Reset after headers. Default 0. Recommended: 10-30s. Includes TLS handshake time for HTTPS. - IdleTimeout: Max time to wait for next request on keep-alive connections. Default 0. Recommended: 60-120s. Example configuration: srv := &http.Server{ Addr: ":8080", Handler: router, ReadTimeout: 10 * time.Second, ReadHeaderTimeout: 5 * time.Second, WriteTimeout: 30 * time.Second, IdleTimeout: 120 * time.Second, MaxHeaderBytes: 1 << 20, // 1MB default } log.Fatal(srv.ListenAndServe) Additional security practices: - Wrap handlers with http.TimeoutHandler for per-handler timeouts (returns 503 on exceed). - Use request context (r.Context) in handlers to propagate cancellations. - Set MaxHeaderBytes to limit header size. - Enable HTTPS with proper TLS config. - For production, implement graceful shutdown with srv.Shutdown(ctx). These prevent goroutine/file descriptor leaks. Tune based on p99 request latency + buffer (e.g., handler timeout T, then Read/Write ~1.5T). Test with tools like httptimeout or slowloris.

Citations:


Add server timeouts to the token service.

Line 55 uses http.ListenAndServe, which provides zero read/write/idle timeouts. This exposes the service to resource exhaustion attacks (e.g., Slowloris) on a system handling the GitHub App private key. Replace with an explicit http.Server struct with configured timeouts:

Suggested fix
 import (
 	"crypto/rsa"
 	"encoding/json"
 	"log"
 	"net/http"
 	"os"
 	"strconv"
+	"time"
 )
@@
-	log.Printf("Token service listening on %s", listenAddr)
-	log.Fatal(http.ListenAndServe(listenAddr, mux))
+	srv := &http.Server{
+		Addr:              listenAddr,
+		Handler:           mux,
+		ReadHeaderTimeout: 5 * time.Second,
+		ReadTimeout:       15 * time.Second,
+		WriteTimeout:      30 * time.Second,
+		IdleTimeout:       60 * time.Second,
+	}
+
+	log.Printf("Token service listening on %s", listenAddr)
+	log.Fatal(srv.ListenAndServe())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
log.Printf("Token service listening on %s", listenAddr)
log.Fatal(http.ListenAndServe(listenAddr, mux))
import (
"crypto/rsa"
"encoding/json"
"log"
"net/http"
"os"
"strconv"
"time"
)
// ... (other code)
srv := &http.Server{
Addr: listenAddr,
Handler: mux,
ReadHeaderTimeout: 5 * time.Second,
ReadTimeout: 15 * time.Second,
WriteTimeout: 30 * time.Second,
IdleTimeout: 60 * time.Second,
}
log.Printf("Token service listening on %s", listenAddr)
log.Fatal(srv.ListenAndServe())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@token-service/main.go` around lines 54 - 55, Replace the direct call to
http.ListenAndServe (used after log.Printf("Token service listening on %s",
listenAddr)) with an instantiated http.Server that sets explicit timeouts (e.g.,
ReadTimeout, WriteTimeout, IdleTimeout) and assigns Addr = listenAddr and
Handler = mux, then call server.ListenAndServe() (and log.Fatal on error) so the
token service is protected against slow/long-lived connections.

@swghosh swghosh force-pushed the rearch branch 2 times, most recently from d341719 to 41bdbff Compare April 6, 2026 15:28
@swghosh swghosh marked this pull request as ready for review April 6, 2026 15:29
@swghosh swghosh changed the title re-arch: Add a go webserver + GH_API token minter, convert Python server agent to script re-arch: Add go webserver with Jobs manager, improve GitHub app token minting security Apr 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (3)
go-server-v2/handlers.go (2)

238-258: ⚠️ Potential issue | 🟠 Major

Check scanner.Err() before emitting complete.

Scan() returns false for both EOF and read failures. Without an Err() check, a broken log stream falls through to the final status lookup and can close the EventSource as if the job completed normally.

Possible fix
 	for scanner.Scan() {
 		select {
 		case <-ctx.Done():
 			return
 		default:
 		}

 		line := scanner.Text()
 		fmt.Fprintf(w, "event: log\ndata: %s\n\n", line)
 		flusher.Flush()
 	}
+
+	if err := scanner.Err(); err != nil {
+		log.Printf("ERROR: scanning logs for pod %s: %v", podName, err)
+		fmt.Fprintf(w, "event: error\ndata: %s\n\n",
+			`{"status":"unknown","message":"log stream interrupted"}`)
+		flusher.Flush()
+		return
+	}
 
 	// Log stream ended — get final job status.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server-v2/handlers.go` around lines 238 - 258, After the scanner loop,
check scanner.Err() and if non-nil treat the stream as failed: write an SSE
error event to w (e.g., "event: error" or "event: complete" with status unknown
and include scanner.Err().Error() in the message), flush via flusher.Flush(),
and return immediately so you do not call a.k8s.GetJobStatus(ctx, jobID); this
ensures broken/read-failure streams are not mistaken for normal completion.

126-129: ⚠️ Potential issue | 🟠 Major

Preserve real Kubernetes failures instead of masking them as missing resources.

These branches still collapse every GetJobStatus, GetJobPod, and pod Get failure into either 404 or "waiting_for_pod". RBAC denials, expired credentials, and apiserver outages will therefore look identical to ordinary lifecycle states, and the stream can spin until timeout instead of surfacing the real failure. Please branch on Kubernetes error classes here (or have K8sClient return typed/sentinel errors for the transient “pod not created yet” case) so only genuine not-found/pending states are retried.

Also applies to: 173-177, 198-204

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server-v2/handlers.go` around lines 126 - 129, The current handlers
collapse all a.k8s.GetJobStatus/GetJobPod/pod Get errors into
404/"waiting_for_pod"; change them to inspect Kubernetes error classes (e.g.,
use k8s.io/apimachinery/pkg/api/errors helpers or have K8sClient return sentinel
typed errors) so that only apierrors.IsNotFound (or your explicit "pod not
created yet" sentinel) is treated as not-found/pending and retried; for
RBAC/credential/apiserver errors (IsUnauthorized, IsForbidden, IsInvalid,
IsServerTimeout, etc.) call writeError with an appropriate HTTP status and the
real error message instead of masking it. Locate the logic around
a.k8s.GetJobStatus, a.k8s.GetJobPod, and the pod Get branches and replace the
unconditional writeError/return with branching that surfaces
non-NotFound/critical k8s errors while still allowing NotFound/pending to be
handled as before.
go-server-v2/k8s.go (1)

78-147: ⚠️ Potential issue | 🟠 Major

Do not hand worker pods a service-account token they never use.

Nothing in this PodSpec or WorkflowParams relies on in-cluster API access, but the Job will still get the namespace service-account token by default. That unnecessarily exposes cluster credentials to the agent container alongside the other mounted secrets.

Possible fix
 func (c *K8sClient) CreateWorkflowJob(ctx context.Context, jobID string, params WorkflowParams) error {
 	jobName := "oape-workflow-" + jobID
 	backoffLimit := int32(0)
 	ttl := params.TTLAfterFinished
+	automountSAToken := false

 	job := &batchv1.Job{
 		ObjectMeta: metav1.ObjectMeta{
 			Name: jobName,
@@
 				},
 				Spec: corev1.PodSpec{
+					AutomountServiceAccountToken: &automountSAToken,
 					RestartPolicy: corev1.RestartPolicyNever,
 					Containers: []corev1.Container{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server-v2/k8s.go` around lines 78 - 147, PodSpec is currently allowing the
default automounting of the namespace service-account token; update the
corev1.PodSpec for the Job pod (the PodSpec that contains the "worker"
container) to set AutomountServiceAccountToken: pointer to false
(AutomountServiceAccountToken: ptr.Bool(false) or &[]bool{false}[0] per your
helpers) so the service-account token is not mounted into the container; modify
the PodSpec in the same block that defines RestartPolicy, Containers and Volumes
to include this field and ensure no code relies on in-cluster credentials.
🧹 Nitpick comments (4)
images-v2/job.Dockerfile (1)

48-50: Use exec form for CMD and verify EXPOSE is needed.

The shell form CMD python3.11 main.py won't propagate signals (SIGTERM) to the Python process, which can cause graceful shutdown issues. Also, EXPOSE 8000 seems unnecessary for a batch job container.

🔧 Proposed fix
-EXPOSE 8000
-
-CMD python3.11 main.py
+CMD ["python3.11", "main.py"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@images-v2/job.Dockerfile` around lines 48 - 50, Replace the shell-form CMD
with the exec JSON array form so signals propagate to the Python process (change
CMD python3.11 main.py to CMD ["python3.11","main.py"] in the Dockerfile), and
review whether EXPOSE 8000 is necessary for this batch job container — if the
process does not listen on port 8000 remove the EXPOSE line; if it does, keep it
but ensure it’s documented why the port is exposed.
images-v2/go-server.Dockerfile (1)

10-13: Add a non-root user in the runtime stage.

The runtime image runs as root by default. For better security posture, run as a non-privileged user.

🔒 Proposed fix
 FROM registry.access.redhat.com/ubi9-minimal:latest
+RUN microdnf install -y shadow-utils && \
+    useradd -r -u 1001 -g 0 appuser && \
+    microdnf clean all
 COPY --from=builder /tmp/app /usr/local/bin/app-server
 EXPOSE 8080
+USER 1001
 ENTRYPOINT ["/usr/local/bin/app-server"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@images-v2/go-server.Dockerfile` around lines 10 - 13, The runtime stage of
the Dockerfile runs the container as root; create a non-privileged user and
switch to it in the runtime stage: add a user/group (e.g., appuser), chown the
copied binary (/usr/local/bin/app-server) and any needed dirs to that user, and
set USER to that non-root account before ENTRYPOINT so the container runs as the
non-root user instead of root (reference the COPY --from=builder /tmp/app
/usr/local/bin/app-server, EXPOSE 8080, and ENTRYPOINT
["/usr/local/bin/app-server"] lines).
server/gh-credential-helper-v2.sh (1)

6-11: Consider validating that the token is non-empty.

If the token file is missing or empty, the script silently outputs an empty password, leading to confusing authentication failures. Adding validation would improve debuggability.

🛡️ Suggested improvement
 TOKEN=$(cat /etc/github-app-token/token 2>/dev/null)
+if [ -z "$TOKEN" ]; then
+  echo "error: GitHub token not found or empty" >&2
+  exit 1
+fi
 
 echo "protocol=https"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/gh-credential-helper-v2.sh` around lines 6 - 11, The script currently
reads the GitHub token into TOKEN and echoes it as the password without
validating it; update the logic around the TOKEN variable to check that TOKEN is
non-empty (after reading from /etc/github-app-token/token) and if empty print a
clear error message to stderr (e.g., "Missing GitHub app token in
/etc/github-app-token/token") and exit non-zero instead of emitting an empty
password; keep the existing echo lines (protocol, host, username, password) but
only print the password when TOKEN is present so callers don’t receive an empty
credential.
deploy-v2/gh-token-minter-deployment.yaml (1)

59-74: Add container security context to harden the deployment.

The container lacks security context settings, which static analysis flagged. Add settings to prevent privilege escalation and enforce read-only root filesystem where possible.

🔒 Proposed security hardening
       containers:
         - name: ghpat
           image: github-token-minter:latest
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            capabilities:
+              drop:
+                - ALL
           env:

Note: readOnlyRootFilesystem: true may require additional emptyDir mounts for any directories the script needs to write to.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 59 - 74, The
container "ghpat" in the Deployment spec currently has no securityContext;
update the "ghpat" container spec for image "github-token-minter:latest" to add
a securityContext that disables privilege escalation (allowPrivilegeEscalation:
false), drops capabilities (capabilities.drop: ["ALL"]), runs as a non-root user
(runAsNonRoot: true and/or runAsUser: a non-root UID), enforces a read-only root
filesystem (readOnlyRootFilesystem: true) and optionally sets a seccompProfile
(type: RuntimeDefault); ensure any writable paths required by the pod are
mounted as writable volumes (e.g., emptyDir) so readOnlyRootFilesystem does not
break startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 80-82: The volume is referencing secretName:
github-app-private-key but the Secret resource in this manifest is named
gh-app-secret (metadata.name), causing pod startup failure; fix by making these
names consistent—either rename the Secret's metadata.name to
github-app-private-key or update the volume's secretName to gh-app-secret (the
volume entry with name: private-key and its secret.secretName field).

In `@deploy-v2/go-server-deployment.yaml`:
- Around line 82-85: The Deployment mounts the oape-configs ConfigMap but that
ConfigMap is not being generated, causing LoadConfig() in go-server-v2/config.go
to fail when reading /config/team-repos.csv; fix by uncommenting and enabling
the configMapGenerator block for oape-configs in deploy-v2/kustomization.yaml so
the ConfigMap (oape-configs) is created and contains team-repos.csv prior to
deployment.
- Around line 68-81: Add a restrictive securityContext for the oape-go-server
container and a pod-level security context: set podSecurityContext with
runAsNonRoot:true and an fsGroup (e.g. 1000) to ensure ConfigMap files are
readable, and in the container securityContext for the container named
"oape-go-server" set runAsUser to a non-root UID (e.g. 1000),
readOnlyRootFilesystem:true, allowPrivilegeEscalation:false, drop all
capabilities (e.g. drop: ["ALL"]), and set seccompProfile to runtime/default;
apply these changes in the Deployment spec near the existing container
definition so the server cannot run as root, escalate privileges, or write to
the image filesystem.

In `@gh-token-minter/mint-token.sh`:
- Around line 29-31: The code assumes at least one installation when assigning
inst_id from resp.json()[0]["id"] which will raise IndexError if none exist;
after the requests.get(...) call and resp.raise_for_status(), parse the body
into a variable (e.g., installs = resp.json()), verify it's a list and that
len(installs) > 0 (or that installs[0] exists), and if empty raise a clear error
or exit with a descriptive message like "No GitHub App installations found"
instead of letting an IndexError surface; update the inst_id assignment to read
from the validated installs[0]["id"].

In `@go-server-v2/config.go`:
- Around line 46-53: The parsed TTL_AFTER_FINISHED value can be negative
(strconv.ParseInt allows negatives) which Kubernetes rejects; in the config
initialization where ttl is set and TTL_AFTER_FINISHED is parsed (the ttl
variable and the TTL_AFTER_FINISHED parse block), add a non-negative validation
after parsing: if n < 0 return an error (e.g., fmt.Errorf("invalid
TTL_AFTER_FINISHED: must be non-negative")) so startup fails fast with a clear
message instead of accepting negative values.

In `@go-server-v2/k8s.go`:
- Around line 173-186: The code returns "pending" if no terminal Condition or
active pods are found, but it must also check job.Status.Succeeded and
job.Status.Failed to avoid the race where counters are updated before the
Failed/Succeeded Condition is set; update the logic in the function that
inspects the job (the loop over job.Status.Conditions and subsequent checks
using job.Status.Active) to first check if job.Status.Succeeded > 0 and return
&JobStatus{Status: "succeeded"}, nil, and then check if job.Status.Failed > 0
and return &JobStatus{Status: "failed", Message: ""}, nil (or include
cond.Message if available), before falling back to returning &JobStatus{Status:
"pending"}, nil so the JobStatus struct and the job variable are used to detect
terminal state via counters as well as conditions.

In `@go-server-v2/static/homepage.html`:
- Around line 147-148: The response body is consumed twice by calling res.json()
in the error check and again when extracting job_id; instead, call await
res.json() once, store the parsed JSON in a variable (e.g., const data), check
res.ok and throw using data.detail or res.statusText if not ok, then destructure
job_id from that same data; update the code paths around the res variable where
this occurs to use that single parsed object.

In `@images-v2/go-server.Dockerfile`:
- Line 4: The COPY instruction with multiple sources uses a destination that
must be a directory ending with a slash; update the COPY in the Dockerfile (the
line containing COPY ./go-server-v2/go.mod ./go-server-v2/go.sum .) to use a
directory destination with a trailing slash (e.g., change the final token from .
to ./ or to a specific directory like ./go-server-v2/) so the Docker build
treats it as a directory and succeeds.

---

Duplicate comments:
In `@go-server-v2/handlers.go`:
- Around line 238-258: After the scanner loop, check scanner.Err() and if
non-nil treat the stream as failed: write an SSE error event to w (e.g., "event:
error" or "event: complete" with status unknown and include
scanner.Err().Error() in the message), flush via flusher.Flush(), and return
immediately so you do not call a.k8s.GetJobStatus(ctx, jobID); this ensures
broken/read-failure streams are not mistaken for normal completion.
- Around line 126-129: The current handlers collapse all
a.k8s.GetJobStatus/GetJobPod/pod Get errors into 404/"waiting_for_pod"; change
them to inspect Kubernetes error classes (e.g., use
k8s.io/apimachinery/pkg/api/errors helpers or have K8sClient return sentinel
typed errors) so that only apierrors.IsNotFound (or your explicit "pod not
created yet" sentinel) is treated as not-found/pending and retried; for
RBAC/credential/apiserver errors (IsUnauthorized, IsForbidden, IsInvalid,
IsServerTimeout, etc.) call writeError with an appropriate HTTP status and the
real error message instead of masking it. Locate the logic around
a.k8s.GetJobStatus, a.k8s.GetJobPod, and the pod Get branches and replace the
unconditional writeError/return with branching that surfaces
non-NotFound/critical k8s errors while still allowing NotFound/pending to be
handled as before.

In `@go-server-v2/k8s.go`:
- Around line 78-147: PodSpec is currently allowing the default automounting of
the namespace service-account token; update the corev1.PodSpec for the Job pod
(the PodSpec that contains the "worker" container) to set
AutomountServiceAccountToken: pointer to false (AutomountServiceAccountToken:
ptr.Bool(false) or &[]bool{false}[0] per your helpers) so the service-account
token is not mounted into the container; modify the PodSpec in the same block
that defines RestartPolicy, Containers and Volumes to include this field and
ensure no code relies on in-cluster credentials.

---

Nitpick comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 59-74: The container "ghpat" in the Deployment spec currently has
no securityContext; update the "ghpat" container spec for image
"github-token-minter:latest" to add a securityContext that disables privilege
escalation (allowPrivilegeEscalation: false), drops capabilities
(capabilities.drop: ["ALL"]), runs as a non-root user (runAsNonRoot: true and/or
runAsUser: a non-root UID), enforces a read-only root filesystem
(readOnlyRootFilesystem: true) and optionally sets a seccompProfile (type:
RuntimeDefault); ensure any writable paths required by the pod are mounted as
writable volumes (e.g., emptyDir) so readOnlyRootFilesystem does not break
startup.

In `@images-v2/go-server.Dockerfile`:
- Around line 10-13: The runtime stage of the Dockerfile runs the container as
root; create a non-privileged user and switch to it in the runtime stage: add a
user/group (e.g., appuser), chown the copied binary (/usr/local/bin/app-server)
and any needed dirs to that user, and set USER to that non-root account before
ENTRYPOINT so the container runs as the non-root user instead of root (reference
the COPY --from=builder /tmp/app /usr/local/bin/app-server, EXPOSE 8080, and
ENTRYPOINT ["/usr/local/bin/app-server"] lines).

In `@images-v2/job.Dockerfile`:
- Around line 48-50: Replace the shell-form CMD with the exec JSON array form so
signals propagate to the Python process (change CMD python3.11 main.py to CMD
["python3.11","main.py"] in the Dockerfile), and review whether EXPOSE 8000 is
necessary for this batch job container — if the process does not listen on port
8000 remove the EXPOSE line; if it does, keep it but ensure it’s documented why
the port is exposed.

In `@server/gh-credential-helper-v2.sh`:
- Around line 6-11: The script currently reads the GitHub token into TOKEN and
echoes it as the password without validating it; update the logic around the
TOKEN variable to check that TOKEN is non-empty (after reading from
/etc/github-app-token/token) and if empty print a clear error message to stderr
(e.g., "Missing GitHub app token in /etc/github-app-token/token") and exit
non-zero instead of emitting an empty password; keep the existing echo lines
(protocol, host, username, password) but only print the password when TOKEN is
present so callers don’t receive an empty credential.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 86e2c326-baa9-4694-a7bb-2d62a31a3032

📥 Commits

Reviewing files that changed from the base of the PR and between 192ce44 and 41bdbff.

⛔ Files ignored due to path filters (1)
  • go-server-v2/go.sum is excluded by !**/*.sum
📒 Files selected for processing (15)
  • deploy-v2/gh-token-minter-deployment.yaml
  • deploy-v2/go-server-deployment.yaml
  • deploy-v2/kustomization.yaml
  • gh-token-minter/mint-token.sh
  • go-server-v2/config.go
  • go-server-v2/go.mod
  • go-server-v2/handlers.go
  • go-server-v2/k8s.go
  • go-server-v2/main.go
  • go-server-v2/static/homepage.html
  • images-v2/gh-token-minter.Dockerfile
  • images-v2/go-server.Dockerfile
  • images-v2/job.Dockerfile
  • server/gh-credential-helper-v2.sh
  • server/main.py
✅ Files skipped from review due to trivial changes (3)
  • deploy-v2/kustomization.yaml
  • images-v2/gh-token-minter.Dockerfile
  • server/main.py

Comment on lines +80 to +82
- name: private-key
secret:
secretName: github-app-private-key
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Secret name mismatch will cause pod startup failure.

The volume references secretName: github-app-private-key, but the Secret defined in this file is named gh-app-secret (line 35). The pod will fail to start with a "secret not found" error.

🔧 Proposed fix

Either rename the Secret:

 apiVersion: v1
 kind: Secret
 metadata:
-  name: gh-app-secret
+  name: github-app-private-key

Or update the volume reference:

       volumes:
         - name: private-key
           secret:
-            secretName: github-app-private-key
+            secretName: gh-app-secret
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: private-key
secret:
secretName: github-app-private-key
- name: private-key
secret:
secretName: gh-app-secret
🧰 Tools
🪛 Checkov (3.2.513)

[medium] 44-82: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 44-82: Minimize the admission of root containers

(CKV_K8S_23)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 80 - 82, The volume
is referencing secretName: github-app-private-key but the Secret resource in
this manifest is named gh-app-secret (metadata.name), causing pod startup
failure; fix by making these names consistent—either rename the Secret's
metadata.name to github-app-private-key or update the volume's secretName to
gh-app-secret (the volume entry with name: private-key and its secret.secretName
field).

Comment on lines +68 to +81
spec:
serviceAccountName: oape-go-server
containers:
- name: oape-go-server
image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest
ports:
- containerPort: 8080
env:
- name: WORKER_IMAGE
value: ghcr.io/shiftweek/oape-ai-e2e:latest
volumeMounts:
- name: config
mountPath: /config
readOnly: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "go-server*" -o -name "*go-server*" | head -20

Repository: shiftweek/oape-ai-e2e

Length of output: 150


🏁 Script executed:

ls -la ./go-server-v2/

Repository: shiftweek/oape-ai-e2e

Length of output: 595


🏁 Script executed:

cat -n ./go-server-v2/main.go

Repository: shiftweek/oape-ai-e2e

Length of output: 1005


🏁 Script executed:

cat -n ./go-server-v2/handlers.go

Repository: shiftweek/oape-ai-e2e

Length of output: 8285


🏁 Script executed:

cat -n ./go-server-v2/config.go

Repository: shiftweek/oape-ai-e2e

Length of output: 5132


🏁 Script executed:

cat -n ./go-server-v2/k8s.go

Repository: shiftweek/oape-ai-e2e

Length of output: 7494


Run the Go server with a restricted security context.

The Deployment currently uses the default pod/container security context, allowing the process to run as root, escalate privileges, and write to the image filesystem. This server only reads configuration from a ConfigMap and serves HTTP requests, so none of these capabilities are needed.

Suggested hardening
     spec:
       serviceAccountName: oape-go-server
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: oape-go-server
           image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
           ports:
             - containerPort: 8080
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
spec:
serviceAccountName: oape-go-server
containers:
- name: oape-go-server
image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest
ports:
- containerPort: 8080
env:
- name: WORKER_IMAGE
value: ghcr.io/shiftweek/oape-ai-e2e:latest
volumeMounts:
- name: config
mountPath: /config
readOnly: true
spec:
serviceAccountName: oape-go-server
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: oape-go-server
image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
ports:
- containerPort: 8080
env:
- name: WORKER_IMAGE
value: ghcr.io/shiftweek/oape-ai-e2e:latest
volumeMounts:
- name: config
mountPath: /config
readOnly: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/go-server-deployment.yaml` around lines 68 - 81, Add a restrictive
securityContext for the oape-go-server container and a pod-level security
context: set podSecurityContext with runAsNonRoot:true and an fsGroup (e.g.
1000) to ensure ConfigMap files are readable, and in the container
securityContext for the container named "oape-go-server" set runAsUser to a
non-root UID (e.g. 1000), readOnlyRootFilesystem:true,
allowPrivilegeEscalation:false, drop all capabilities (e.g. drop: ["ALL"]), and
set seccompProfile to runtime/default; apply these changes in the Deployment
spec near the existing container definition so the server cannot run as root,
escalate privileges, or write to the image filesystem.

Comment on lines +29 to +31
resp = requests.get("https://api.github.com/app/installations", headers=headers)
resp.raise_for_status()
inst_id = resp.json()[0]["id"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle case where no installations exist.

If the GitHub App has no installations, resp.json()[0] will raise an IndexError. Add validation to provide a clearer error message.

🛡️ Proposed fix
 resp = requests.get("https://api.github.com/app/installations", headers=headers)
 resp.raise_for_status()
-inst_id = resp.json()[0]["id"]
+installations = resp.json()
+if not installations:
+    raise SystemExit("No installations found for this GitHub App")
+inst_id = installations[0]["id"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
resp = requests.get("https://api.github.com/app/installations", headers=headers)
resp.raise_for_status()
inst_id = resp.json()[0]["id"]
resp = requests.get("https://api.github.com/app/installations", headers=headers)
resp.raise_for_status()
installations = resp.json()
if not installations:
raise SystemExit("No installations found for this GitHub App")
inst_id = installations[0]["id"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gh-token-minter/mint-token.sh` around lines 29 - 31, The code assumes at
least one installation when assigning inst_id from resp.json()[0]["id"] which
will raise IndexError if none exist; after the requests.get(...) call and
resp.raise_for_status(), parse the body into a variable (e.g., installs =
resp.json()), verify it's a list and that len(installs) > 0 (or that installs[0]
exists), and if empty raise a clear error or exit with a descriptive message
like "No GitHub App installations found" instead of letting an IndexError
surface; update the inst_id assignment to read from the validated
installs[0]["id"].

Comment on lines +46 to +53
ttl := int32(5400) // 1h30m
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
ttl = int32(n)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check the file exists and read the relevant section with context
cd go-server-v2 && wc -l config.go && cat -n config.go | head -70

Repository: shiftweek/oape-ai-e2e

Length of output: 2273


🏁 Script executed:

# Search for where ttl is used after being set
cd go-server-v2 && rg "ttl" -i -A 5 -B 5 config.go

Repository: shiftweek/oape-ai-e2e

Length of output: 1401


🏁 Script executed:

# Search the entire codebase for where this TTL value is passed to Kubernetes
cd go-server-v2 && rg "ttlSecondsAfterFinished|ttl" --type go -A 3 -B 3 | head -100

Repository: shiftweek/oape-ai-e2e

Length of output: 1553


🌐 Web query:

Kubernetes Job spec ttlSecondsAfterFinished negative value requirement

💡 Result:

In a Kubernetes Job (batch/v1), .spec.ttlSecondsAfterFinished must be a non-negative integer (≥ 0). If you set a negative value, the API server rejects the object with a validation error like: spec.ttlSecondsAfterFinished: must be greater than or equal to 0.[1]

Semantics:

  • Omitted / null: TTL cleanup is not configured.
  • 0: the finished Job becomes eligible for deletion immediately after it completes.[2]

Sources: [1] [2]


Validate TTL_AFTER_FINISHED is non-negative at startup.

strconv.ParseInt accepts negative values, but Kubernetes rejects negative ttlSecondsAfterFinished. Letting that through means the server boots successfully and only fails when users submit jobs.

Possible fix
 	ttl := int32(5400) // 1h30m
 	if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
 		n, err := strconv.ParseInt(v, 10, 32)
 		if err != nil {
 			return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
 		}
+		if n < 0 {
+			return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: must be >= 0")
+		}
 		ttl = int32(n)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ttl := int32(5400) // 1h30m
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
ttl = int32(n)
}
ttl := int32(5400) // 1h30m
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
if n < 0 {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: must be >= 0")
}
ttl = int32(n)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server-v2/config.go` around lines 46 - 53, The parsed TTL_AFTER_FINISHED
value can be negative (strconv.ParseInt allows negatives) which Kubernetes
rejects; in the config initialization where ttl is set and TTL_AFTER_FINISHED is
parsed (the ttl variable and the TTL_AFTER_FINISHED parse block), add a
non-negative validation after parsing: if n < 0 return an error (e.g.,
fmt.Errorf("invalid TTL_AFTER_FINISHED: must be non-negative")) so startup fails
fast with a clear message instead of accepting negative values.

Comment on lines +173 to +186
for _, cond := range job.Status.Conditions {
if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "succeeded"}, nil
}
if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "failed", Message: cond.Message}, nil
}
}

if job.Status.Active > 0 {
return &JobStatus{Status: "running"}, nil
}

return &JobStatus{Status: "pending"}, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "k8s.go" | head -20

Repository: shiftweek/oape-ai-e2e

Length of output: 86


🏁 Script executed:

cat -n ./go-server-v2/k8s.go | sed -n '160,200p'

Repository: shiftweek/oape-ai-e2e

Length of output: 1757


🏁 Script executed:

grep -n "CreateWorkflowJob" ./go-server-v2/k8s.go | head -5

Repository: shiftweek/oape-ai-e2e

Length of output: 240


🏁 Script executed:

cat -n ./go-server-v2/k8s.go | sed -n '55,120p'

Repository: shiftweek/oape-ai-e2e

Length of output: 2408


🏁 Script executed:

cat -n ./go-server-v2/k8s.go | head -30

Repository: shiftweek/oape-ai-e2e

Length of output: 1108


🌐 Web query:

Kubernetes Job controller BackoffLimit 0 conditions counters timing

💡 Result:

What backoffLimit: 0 does (Job controller behavior)

  • spec.backoffLimit is the maximum number of failures the Job controller will tolerate before it marks the Job as failed. (kubernetes.io)
  • With backoffLimit: 0, the Job gets zero retries: the first counted failure immediately causes the Job to be considered failed. (Practically: you’ll get at most one “attempt” that is allowed to fail.) (kubernetes.io)

Counters you’ll see (.status.*)

  • .status.failed increments as the controller observes failed Pods for the Job; with backoffLimit: 0, it typically reaches 1 on the first failed Pod and the Job becomes failed. (v1-33.docs.kubernetes.io)
  • .status.succeeded stays 0 (unless you have another Pod that succeeded before the failure condition was triggered, e.g., parallelism > 1). (kubernetes.io)

Conditions you’ll see (.status.conditions)

  • When the backoff limit is exceeded, the Job is marked failed and you commonly see a terminal condition of type Failed=True with reason BackoffLimitExceeded. (github.com)
  • On newer Kubernetes, the controller may first add an intermediate FailureTarget condition when the failure criteria are met, and only later add the terminal Failed condition after cleanup/termination of Pods finishes. (kubernetes.io)

Timing / “why didn’t it fail instantly?”

  • Even after the controller decides the Job has failed (backoff exceeded), there can be delay before everything looks “done” because Pods may still be terminating (for example due to terminationGracePeriodSeconds). The docs explicitly note timing gaps between setting intermediate conditions (like FailureTarget) and the final terminal condition (Failed / Complete). (kubernetes.io)
  • Exponential backoff between retries is only relevant when retries are allowed; with backoffLimit: 0, there’s no second retry to wait for. (kubernetes.io)

Citations:


Check Job counters before falling back to pending.

With BackoffLimit: 0 in CreateWorkflowJob, the Kubernetes controller marks a Job as failed immediately upon the first Pod failure by incrementing .status.failed. However, there is a timing gap before the terminal Failed condition is written. During this window, GetJobStatus can report a finished job as pending because it only checks conditions and the Active counter.

Include checks for .status.Succeeded > 0 and .status.Failed > 0 before the fallback to pending status to eliminate this race condition.

Suggested fix
 	for _, cond := range job.Status.Conditions {
 		if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue {
 			return &JobStatus{Status: "succeeded"}, nil
 		}
 		if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue {
 			return &JobStatus{Status: "failed", Message: cond.Message}, nil
 		}
 	}
+
+	if job.Status.Succeeded > 0 {
+		return &JobStatus{Status: "succeeded"}, nil
+	}
+	if job.Status.Failed > 0 {
+		return &JobStatus{Status: "failed"}, nil
+	}
 
 	if job.Status.Active > 0 {
 		return &JobStatus{Status: "running"}, nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, cond := range job.Status.Conditions {
if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "succeeded"}, nil
}
if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "failed", Message: cond.Message}, nil
}
}
if job.Status.Active > 0 {
return &JobStatus{Status: "running"}, nil
}
return &JobStatus{Status: "pending"}, nil
for _, cond := range job.Status.Conditions {
if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "succeeded"}, nil
}
if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "failed", Message: cond.Message}, nil
}
}
if job.Status.Succeeded > 0 {
return &JobStatus{Status: "succeeded"}, nil
}
if job.Status.Failed > 0 {
return &JobStatus{Status: "failed"}, nil
}
if job.Status.Active > 0 {
return &JobStatus{Status: "running"}, nil
}
return &JobStatus{Status: "pending"}, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server-v2/k8s.go` around lines 173 - 186, The code returns "pending" if no
terminal Condition or active pods are found, but it must also check
job.Status.Succeeded and job.Status.Failed to avoid the race where counters are
updated before the Failed/Succeeded Condition is set; update the logic in the
function that inspects the job (the loop over job.Status.Conditions and
subsequent checks using job.Status.Active) to first check if
job.Status.Succeeded > 0 and return &JobStatus{Status: "succeeded"}, nil, and
then check if job.Status.Failed > 0 and return &JobStatus{Status: "failed",
Message: ""}, nil (or include cond.Message if available), before falling back to
returning &JobStatus{Status: "pending"}, nil so the JobStatus struct and the job
variable are used to detect terminal state via counters as well as conditions.

FROM registry.access.redhat.com/ubi9/go-toolset AS builder
WORKDIR /go/src/app

COPY ./go-server-v2/go.mod ./go-server-v2/go.sum .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix COPY destination syntax.

When COPY has more than two arguments, the destination must be a directory ending with /. The current syntax will fail or produce unexpected results.

🔧 Proposed fix
-COPY ./go-server-v2/go.mod ./go-server-v2/go.sum . 
+COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ./
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY ./go-server-v2/go.mod ./go-server-v2/go.sum .
COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ./
🧰 Tools
🪛 Hadolint (2.14.0)

[error] 4-4: COPY with more than 2 arguments requires the last argument to end with /

(DL3021)

🪛 Trivy (0.69.3)

[error] 4-4: COPY with more than two arguments not ending with slash

Slash is expected at the end of COPY command argument '.'

Rule: DS-0011

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@images-v2/go-server.Dockerfile` at line 4, The COPY instruction with multiple
sources uses a destination that must be a directory ending with a slash; update
the COPY in the Dockerfile (the line containing COPY ./go-server-v2/go.mod
./go-server-v2/go.sum .) to use a directory destination with a trailing slash
(e.g., change the final token from . to ./ or to a specific directory like
./go-server-v2/) so the Docker build treats it as a directory and succeeds.

@swghosh swghosh changed the title re-arch: Add go webserver with Jobs manager, improve GitHub app token minting security re-arch: Add go webserver with Jobs manager, improve GitHub app token minting Apr 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
deploy-v2/gh-token-minter-deployment.yaml (1)

79-82: ⚠️ Potential issue | 🔴 Critical

Fix secret reference mismatch to avoid pod startup failure.

Line 82 references github-app-private-key, but the Secret defined at Line 35 is gh-app-secret.

Suggested fix
       volumes:
         - name: private-key
           secret:
-            secretName: github-app-private-key
+            secretName: gh-app-secret
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 79 - 82, The pod
volume named "private-key" references secretName "github-app-private-key" but
the actual Secret created is "gh-app-secret"; update the secretName value
referenced by the volume (volume: name "private-key") to "gh-app-secret" so the
volume mounts the correct Secret (or alternatively rename the Secret resource to
"github-app-private-key" to match the volume), ensuring the names for the Secret
and the secretName in the deployment match.
go-server-v2/config.go (1)

47-53: ⚠️ Potential issue | 🟡 Minor

Reject negative TTL_AFTER_FINISHED values during config load.

Line 48-Line 53 accepts negative TTL values, but Kubernetes Job TTL must be non-negative. Validate here so startup fails fast with a clear error.

Suggested fix
 	if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
 		n, err := strconv.ParseInt(v, 10, 32)
 		if err != nil {
 			return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
 		}
+		if n < 0 {
+			return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: must be >= 0")
+		}
 		ttl = int32(n)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server-v2/config.go` around lines 47 - 53, When parsing TTL_AFTER_FINISHED
in the config load, reject negative values: after strconv.ParseInt(v, 10, 32)
and before assigning ttl = int32(n), check if n < 0 and return a descriptive
error (e.g. "invalid TTL_AFTER_FINISHED: must be non-negative") so startup fails
fast; update the block that reads TTL_AFTER_FINISHED (the strconv.ParseInt call
and the ttl assignment) to enforce this non-negative validation.
images-v2/go-server.Dockerfile (1)

4-4: ⚠️ Potential issue | 🔴 Critical

Fix invalid multi-source COPY destination.

Line 4 uses multiple sources with destination .; this must be a directory token ending with /.

Suggested fix
-COPY ./go-server-v2/go.mod ./go-server-v2/go.sum . 
+COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ./
#!/bin/bash
# Verify multi-source COPY destinations in this Dockerfile end with '/'
python - <<'PY'
from pathlib import Path
p = Path("images-v2/go-server.Dockerfile")
for i, line in enumerate(p.read_text().splitlines(), 1):
    s = line.strip()
    if not s.upper().startswith("COPY "):
        continue
    toks = s.split()
    if len(toks) > 3:
        dest = toks[-1]
        if not dest.endswith("/"):
            print(f"Line {i}: invalid multi-source COPY destination -> {dest}")
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@images-v2/go-server.Dockerfile` at line 4, The Dockerfile's multi-source COPY
instruction "COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ." uses a
destination token '.' which is invalid for multiple sources; update the COPY
line (the COPY instruction that lists ./go-server-v2/go.mod and
./go-server-v2/go.sum) so the destination is a directory token that ends with
'/' (for example replace '.' with './' or a named directory like 'app/'),
preserving the two source paths and ensuring the destination ends with a
trailing slash.
deploy-v2/go-server-deployment.yaml (1)

68-81: ⚠️ Potential issue | 🟠 Major

Add restrictive pod/container security context.

The workload runs with default security settings. Please explicitly enforce non-root execution and disable privilege escalation.

Suggested fix
     spec:
       serviceAccountName: oape-go-server
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: oape-go-server
           image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
           ports:
             - containerPort: 8080
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/go-server-deployment.yaml` around lines 68 - 81, The pod currently
uses default security settings; update the manifest around
spec.serviceAccountName and the container with name "oape-go-server" to enforce
non-root and disable privilege escalation by adding a podSecurityContext and
container.securityContext: set runAsNonRoot: true and a non-root runAsUser (e.g.
1000) at pod or container level, set allowPrivilegeEscalation: false in the
container securityContext, drop all capabilities (capabilities.drop: ["ALL"]),
enable readOnlyRootFilesystem: true for the container, and add a seccompProfile
(type: RuntimeDefault) to harden the workload; apply these fields next to the
existing containers/volumeMounts entries for "oape-go-server".
go-server-v2/go.mod (1)

35-35: ⚠️ Potential issue | 🟠 Major

Upgrade vulnerable golang.org/x/oauth2 version.

Line 35 uses v0.21.0, which is flagged for GO-2025-3488/GHSA-6v2p-p543-phr9. Please bump to a patched version (>= v0.27.0).

Suggested fix
-	golang.org/x/oauth2 v0.21.0 // indirect
+	golang.org/x/oauth2 v0.27.0 // indirect
Is golang.org/x/oauth2 v0.21.0 affected by GO-2025-3488 / GHSA-6v2p-p543-phr9, and what is the minimum fixed version?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go-server-v2/go.mod` at line 35, The dependency golang.org/x/oauth2 is pinned
at v0.21.0 which is vulnerable; update the module requirement to a patched
release (>= v0.27.0) by running go get golang.org/x/oauth2@v0.27.0 (or a newer
v0.27.x+), then run go mod tidy to update go.mod/go.sum and rebuild/tests to
verify no breakage and that the upgraded golang.org/x/oauth2 version is
reflected in the module list.
server/main.py (1)

26-37: ⚠️ Potential issue | 🟠 Major

Emit lifecycle events as JSON lines, not plain strings.

Line 26 and Line 31-Line 37 currently output plain text, which breaks this module’s stated JSONL contract and makes stream parsing inconsistent.

Suggested fix
-    print(f"Starting workflow: ep_url={ep_url} repo={repo}", flush=True)
+    print(json.dumps({
+        "event": "workflow_start",
+        "ep_url": ep_url,
+        "repo": repo,
+    }, default=str), flush=True)
@@
-        print(f"WORKFLOW_SUCCESS cost=${result.cost_usd:.4f}", flush=True)
+        print(json.dumps({
+            "event": "workflow_success",
+            "cost_usd": result.cost_usd,
+        }, default=str), flush=True)
         for pr in result.prs:
-            print(f"PR_CREATED: {pr.pr_url}", flush=True)
+            print(json.dumps({
+                "event": "pr_created",
+                "pr_url": pr.pr_url,
+            }, default=str), flush=True)
@@
-        print(f"WORKFLOW_FAILED: {result.error}", file=sys.stderr, flush=True)
+        print(json.dumps({
+            "event": "workflow_failed",
+            "error": str(result.error),
+        }, default=str), file=sys.stderr, flush=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/main.py` around lines 26 - 37, The scripts currently print lifecycle
messages as plain text; change them to emit JSONL via the existing print_json
helper so downstream parsers get structured events: replace the initial
print("Starting workflow...") with
print_json(data={"event":"WORKFLOW_STARTED","ep_url":ep_url,"repo":repo}),
replace the WORKFLOW_SUCCESS/WORKFLOW_FAILED prints with
print_json(data={"event":"WORKFLOW_SUCCESS","cost_usd":result.cost_usd}) and
print_json(data={"event":"WORKFLOW_FAILED","error":result.error}) respectively,
and when iterating result.prs call
print_json(data={"event":"PR_CREATED","pr_url":pr.pr_url}) for each PR; keep the
same exit codes and use the run_workflow, result, pr, and print_json symbols to
locate the changes.
🧹 Nitpick comments (2)
deploy-v2/go-server-deployment.yaml (1)

72-72: Prefer immutable image pinning over :latest.

Using :latest makes deployments non-reproducible and can cause drift between rollouts. Pin to a version tag or digest.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/go-server-deployment.yaml` at line 72, The container image
currently uses an immutable-unsafe tag
"ghcr.io/shiftweek/oape-ai-e2e/go-server:latest"; change the image reference in
the go-server container spec (the line containing image:
ghcr.io/shiftweek/oape-ai-e2e/go-server:latest) to a pinned version (e.g., a
release tag like :v1.2.3) or an immutable digest (sha256:...), and optionally
set imagePullPolicy to IfNotPresent to avoid unexpected updates during rollout.
deploy-v2/gh-token-minter-deployment.yaml (1)

15-17: Reduce RBAC scope by removing broad secrets/create if possible.

Line 15-Line 17 allows creating any Secret in the namespace. Prefer pre-creating github-app-token and limiting this role to get/update/patch only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 15 - 17, The current
RBAC rule grants the broad "create" verb on resources: ["secrets"], allowing
creation of any Secret; change the rule to restrict verbs to
["get","update","patch"] and add resourceNames: ["github-app-token"] so the role
only operates on the pre-created secret named github-app-token; update the block
that contains resources: ["secrets"] and verbs: ["create"] accordingly (replace
"create" with the new verbs and add resourceNames) and ensure the
github-app-token secret is pre-created in the namespace.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 57-78: The Deployment runs with default privileges; add explicit
pod- and container-level securityContext entries to enforce non-root and no
privilege escalation: under spec (pod level) add securityContext: runAsNonRoot:
true and a non-zero runAsUser (e.g., 1000) and optionally fsGroup if needed, and
for the container named "ghpat" add container-level securityContext with
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities: {
drop: ["ALL"] } (and optionally seccompProfile: { type: RuntimeDefault }); keep
the existing volumeMount (private-key) readOnly and ensure any needed file perms
match the chosen runAsUser.

In `@images-v2/go-server.Dockerfile`:
- Around line 10-13: The Dockerfile's runtime stage runs the server as root
because no USER is set; modify the runtime stage to create a non-root user/group
(e.g., appuser), chown the binary at /usr/local/bin/app-server to that user, and
add a USER directive before ENTRYPOINT so the container runs as the non-root
user; ensure any required directories or ports are accessible to that user and
reference the existing ENTRYPOINT ["/usr/local/bin/app-server"] when making
these changes.

---

Duplicate comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 79-82: The pod volume named "private-key" references secretName
"github-app-private-key" but the actual Secret created is "gh-app-secret";
update the secretName value referenced by the volume (volume: name
"private-key") to "gh-app-secret" so the volume mounts the correct Secret (or
alternatively rename the Secret resource to "github-app-private-key" to match
the volume), ensuring the names for the Secret and the secretName in the
deployment match.

In `@deploy-v2/go-server-deployment.yaml`:
- Around line 68-81: The pod currently uses default security settings; update
the manifest around spec.serviceAccountName and the container with name
"oape-go-server" to enforce non-root and disable privilege escalation by adding
a podSecurityContext and container.securityContext: set runAsNonRoot: true and a
non-root runAsUser (e.g. 1000) at pod or container level, set
allowPrivilegeEscalation: false in the container securityContext, drop all
capabilities (capabilities.drop: ["ALL"]), enable readOnlyRootFilesystem: true
for the container, and add a seccompProfile (type: RuntimeDefault) to harden the
workload; apply these fields next to the existing containers/volumeMounts
entries for "oape-go-server".

In `@go-server-v2/config.go`:
- Around line 47-53: When parsing TTL_AFTER_FINISHED in the config load, reject
negative values: after strconv.ParseInt(v, 10, 32) and before assigning ttl =
int32(n), check if n < 0 and return a descriptive error (e.g. "invalid
TTL_AFTER_FINISHED: must be non-negative") so startup fails fast; update the
block that reads TTL_AFTER_FINISHED (the strconv.ParseInt call and the ttl
assignment) to enforce this non-negative validation.

In `@go-server-v2/go.mod`:
- Line 35: The dependency golang.org/x/oauth2 is pinned at v0.21.0 which is
vulnerable; update the module requirement to a patched release (>= v0.27.0) by
running go get golang.org/x/oauth2@v0.27.0 (or a newer v0.27.x+), then run go
mod tidy to update go.mod/go.sum and rebuild/tests to verify no breakage and
that the upgraded golang.org/x/oauth2 version is reflected in the module list.

In `@images-v2/go-server.Dockerfile`:
- Line 4: The Dockerfile's multi-source COPY instruction "COPY
./go-server-v2/go.mod ./go-server-v2/go.sum ." uses a destination token '.'
which is invalid for multiple sources; update the COPY line (the COPY
instruction that lists ./go-server-v2/go.mod and ./go-server-v2/go.sum) so the
destination is a directory token that ends with '/' (for example replace '.'
with './' or a named directory like 'app/'), preserving the two source paths and
ensuring the destination ends with a trailing slash.

In `@server/main.py`:
- Around line 26-37: The scripts currently print lifecycle messages as plain
text; change them to emit JSONL via the existing print_json helper so downstream
parsers get structured events: replace the initial print("Starting workflow...")
with print_json(data={"event":"WORKFLOW_STARTED","ep_url":ep_url,"repo":repo}),
replace the WORKFLOW_SUCCESS/WORKFLOW_FAILED prints with
print_json(data={"event":"WORKFLOW_SUCCESS","cost_usd":result.cost_usd}) and
print_json(data={"event":"WORKFLOW_FAILED","error":result.error}) respectively,
and when iterating result.prs call
print_json(data={"event":"PR_CREATED","pr_url":pr.pr_url}) for each PR; keep the
same exit codes and use the run_workflow, result, pr, and print_json symbols to
locate the changes.

---

Nitpick comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 15-17: The current RBAC rule grants the broad "create" verb on
resources: ["secrets"], allowing creation of any Secret; change the rule to
restrict verbs to ["get","update","patch"] and add resourceNames:
["github-app-token"] so the role only operates on the pre-created secret named
github-app-token; update the block that contains resources: ["secrets"] and
verbs: ["create"] accordingly (replace "create" with the new verbs and add
resourceNames) and ensure the github-app-token secret is pre-created in the
namespace.

In `@deploy-v2/go-server-deployment.yaml`:
- Line 72: The container image currently uses an immutable-unsafe tag
"ghcr.io/shiftweek/oape-ai-e2e/go-server:latest"; change the image reference in
the go-server container spec (the line containing image:
ghcr.io/shiftweek/oape-ai-e2e/go-server:latest) to a pinned version (e.g., a
release tag like :v1.2.3) or an immutable digest (sha256:...), and optionally
set imagePullPolicy to IfNotPresent to avoid unexpected updates during rollout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: adafdb36-9228-4202-991d-6013ac239c7f

📥 Commits

Reviewing files that changed from the base of the PR and between 41bdbff and 490288e.

⛔ Files ignored due to path filters (2)
  • deploy-v2/config/team-repos.csv is excluded by !**/*.csv
  • go-server-v2/go.sum is excluded by !**/*.sum
📒 Files selected for processing (17)
  • deploy-v2/config/config.json
  • deploy-v2/gh-token-minter-deployment.yaml
  • deploy-v2/go-server-deployment.yaml
  • deploy-v2/kustomization.yaml
  • gh-token-minter/mint-token.sh
  • go-server-v2/config.go
  • go-server-v2/go.mod
  • go-server-v2/handlers.go
  • go-server-v2/k8s.go
  • go-server-v2/main.go
  • go-server-v2/static/homepage.html
  • images-v2/gh-token-minter.Dockerfile
  • images-v2/go-server.Dockerfile
  • images-v2/job.Dockerfile
  • server/gh-credential-helper-v2.sh
  • server/main.py
  • server/requirements.txt
✅ Files skipped from review due to trivial changes (4)
  • server/requirements.txt
  • deploy-v2/config/config.json
  • go-server-v2/static/homepage.html
  • gh-token-minter/mint-token.sh
🚧 Files skipped from review as they are similar to previous changes (7)
  • server/gh-credential-helper-v2.sh
  • images-v2/gh-token-minter.Dockerfile
  • images-v2/job.Dockerfile
  • go-server-v2/main.go
  • deploy-v2/kustomization.yaml
  • go-server-v2/handlers.go
  • go-server-v2/k8s.go

Comment on lines +57 to +78
spec:
serviceAccountName: github-token-minter
containers:
- name: ghpat
image: github-token-minter:latest
env:
- name: APP_ID
value: "3065249"
- name: PRIVATE_KEY_PATH
value: /etc/github-app/private-key.pem
- name: SECRET_NAME
value: github-app-token
- name: SECRET_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: REFRESH_INTERVAL
value: "1000"
volumeMounts:
- name: private-key
mountPath: /etc/github-app
readOnly: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden pod/container security context.

The Deployment currently runs with default security settings. Add explicit non-root, no-privilege-escalation, and read-only root FS controls.

Suggested fix
     spec:
       serviceAccountName: github-token-minter
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
         - name: ghpat
           image: github-token-minter:latest
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
spec:
serviceAccountName: github-token-minter
containers:
- name: ghpat
image: github-token-minter:latest
env:
- name: APP_ID
value: "3065249"
- name: PRIVATE_KEY_PATH
value: /etc/github-app/private-key.pem
- name: SECRET_NAME
value: github-app-token
- name: SECRET_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: REFRESH_INTERVAL
value: "1000"
volumeMounts:
- name: private-key
mountPath: /etc/github-app
readOnly: true
spec:
serviceAccountName: github-token-minter
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: ghpat
image: github-token-minter:latest
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
env:
- name: APP_ID
value: "3065249"
- name: PRIVATE_KEY_PATH
value: /etc/github-app/private-key.pem
- name: SECRET_NAME
value: github-app-token
- name: SECRET_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: REFRESH_INTERVAL
value: "1000"
volumeMounts:
- name: private-key
mountPath: /etc/github-app
readOnly: true
🧰 Tools
🪛 Trivy (0.69.3)

[error] 53-78: Default security context configured

deployment github-token-minter in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 57 - 78, The
Deployment runs with default privileges; add explicit pod- and container-level
securityContext entries to enforce non-root and no privilege escalation: under
spec (pod level) add securityContext: runAsNonRoot: true and a non-zero
runAsUser (e.g., 1000) and optionally fsGroup if needed, and for the container
named "ghpat" add container-level securityContext with allowPrivilegeEscalation:
false, readOnlyRootFilesystem: true, capabilities: { drop: ["ALL"] } (and
optionally seccompProfile: { type: RuntimeDefault }); keep the existing
volumeMount (private-key) readOnly and ensure any needed file perms match the
chosen runAsUser.

Comment on lines +10 to +13
FROM registry.access.redhat.com/ubi9-minimal:latest
COPY --from=builder /tmp/app /usr/local/bin/app-server
EXPOSE 8080
ENTRYPOINT ["/usr/local/bin/app-server"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Run the runtime container as non-root.

The runtime stage does not set a non-root user, so the server runs as root by default.

Suggested fix
 FROM registry.access.redhat.com/ubi9-minimal:latest
 COPY --from=builder /tmp/app /usr/local/bin/app-server
+USER 1001
 EXPOSE 8080
 ENTRYPOINT ["/usr/local/bin/app-server"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM registry.access.redhat.com/ubi9-minimal:latest
COPY --from=builder /tmp/app /usr/local/bin/app-server
EXPOSE 8080
ENTRYPOINT ["/usr/local/bin/app-server"]
FROM registry.access.redhat.com/ubi9-minimal:latest
COPY --from=builder /tmp/app /usr/local/bin/app-server
USER 1001
EXPOSE 8080
ENTRYPOINT ["/usr/local/bin/app-server"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@images-v2/go-server.Dockerfile` around lines 10 - 13, The Dockerfile's
runtime stage runs the server as root because no USER is set; modify the runtime
stage to create a non-root user/group (e.g., appuser), chown the binary at
/usr/local/bin/app-server to that user, and add a USER directive before
ENTRYPOINT so the container runs as the non-root user; ensure any required
directories or ports are accessible to that user and reference the existing
ENTRYPOINT ["/usr/local/bin/app-server"] when making these changes.

@swghosh
Copy link
Copy Markdown
Collaborator Author

swghosh commented Apr 6, 2026

@coderabbitai Please draw a diagram for the new architecture, use Mermaid.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

Tip

For best results, initiate chat on the files or code changes.

@swghosh Sure! Here's a Mermaid architecture diagram for the new system introduced in this PR:

graph TB
    subgraph Browser["User Browser"]
        UI["Web UI\n(homepage.html)"]
    end

    subgraph K8s["Kubernetes Cluster (namespace: shift-ai-v2)"]
        subgraph GoServer["Deployment: oape-go-server"]
            GSvc["Go HTTP Server\n:8080\n\nGET /\nGET /repos\nPOST /submit\nGET /status/jobid\nGET /stream/jobid (SSE)"]
        end

        subgraph TokenMinter["Deployment: github-token-minter"]
            Minter["Token Minter\nmint-token.sh\n(loop every ~3500s)"]
        end

        subgraph WorkflowJob["batch Job: oape-workflow"]
            PyScript["Python Worker\nmain.py\n\nrun_workflow(ep_url, repo)"]
        end

        subgraph Secrets["Secrets and ConfigMaps"]
            GHTokenSecret["Secret\ngithub-app-token\n(token refreshed by minter)"]
            GHAppSecret["Secret\ngh-app-secret\n(private-key.pem)"]
            GCloudSecret["Secret\ngcloud-adc-secret\n(ADC JSON)"]
            WorkerCM["ConfigMap\noape-worker-config\n(Vertex AI / Anthropic env)"]
            OapeCM["ConfigMap\noape-configs\n(config.json, team-repos.csv)"]
        end

        subgraph RBAC["RBAC"]
            SA_Go["ServiceAccount\noape-go-server\n(create/list/watch Jobs, Pods, logs)"]
            SA_Minter["ServiceAccount\ngithub-token-minter\n(get/update/patch/create Secret)"]
        end
    end

    subgraph ExternalServices["External Services"]
        GitHubAPI["GitHub API\n(App Installations\n+ Access Tokens)"]
        GitHubRepo["GitHub\n(git clone / push PR)"]
        VertexAI["Vertex AI / Anthropic\n(Claude)"]
    end

    UI -->|"POST /submit (ep_url, repo)"| GSvc
    UI -->|"GET /stream/jobid SSE"| GSvc
    GSvc -->|"HTML"| UI

    GSvc -->|"Create Job\n(via K8s API)"| WorkflowJob
    GSvc -->|"Poll status / stream pod logs"| WorkflowJob
    GSvc -->|"Reads team-repos.csv"| OapeCM
    GSvc --- SA_Go

    Minter -->|"Read PEM key"| GHAppSecret
    Minter -->|"JWT to Installation Token"| GitHubAPI
    Minter -->|"kubectl apply secret"| GHTokenSecret
    Minter --- SA_Minter

    PyScript -->|"EnvFrom"| WorkerCM
    PyScript -->|"Mount /etc/github-app-token"| GHTokenSecret
    PyScript -->|"Mount /config"| OapeCM
    PyScript -->|"Mount ADC JSON"| GCloudSecret
    PyScript -->|"git credential helper - clone/push"| GitHubRepo
    PyScript -->|"LLM calls"| VertexAI

    classDef k8sbox fill:steelblue,stroke:darkblue,color:black
    classDef extbox fill:mediumseagreen,stroke:seagreen,color:black
    classDef secretbox fill:sandybrown,stroke:darkgoldenrod,color:black
    classDef rbacbox fill:mediumpurple,stroke:rebeccapurple,color:black
    class GoServer,TokenMinter,WorkflowJob k8sbox
    class ExternalServices,GitHubAPI,GitHubRepo,VertexAI extbox
    class Secrets,GHTokenSecret,GHAppSecret,GCloudSecret,WorkerCM,OapeCM secretbox
    class RBAC,SA_Go,SA_Minter rbacbox

Loading

Key flows summarised

Flow Description
Submit Browser POSTs ep_url + repo → Go server validates & creates a K8s batch/Job → returns job_id
Streaming Browser opens SSE on /stream/{job_id} → Go server waits for pod, streams pod logs line-by-line, emits complete event
Token refresh Token Minter wakes every ~3500 s, mints a new GitHub App installation token via JWT, writes it into the github-app-token Secret
Worker execution Python main.py runs inside the Job pod, calls run_workflow(), uses mounted GCloud ADC for Vertex/Anthropic, and the Git credential helper reads the refreshed token for GitHub access

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 67-70: Update the volume's Secret reference: in the deployment's
volumes entry for name "private-key" (the block where secret.secretName is set),
change secretName from "github-app-private-key" to "gh-app-secret" so the volume
mounts the actual Secret defined in secret-configs.yaml.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb6897e7-f4c3-41fa-b1df-2acaaa0ad132

📥 Commits

Reviewing files that changed from the base of the PR and between 490288e and 04514a8.

📒 Files selected for processing (4)
  • deploy-v2/gh-token-minter-deployment.yaml
  • deploy-v2/go-server-deployment.yaml
  • deploy-v2/kustomization.yaml
  • deploy-v2/secret-configs.yaml
✅ Files skipped from review due to trivial changes (1)
  • deploy-v2/secret-configs.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • deploy-v2/kustomization.yaml

Comment on lines +67 to +70
volumes:
- name: private-key
secret:
secretName: github-app-private-key
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for github-app-private-key Secret definition in the deploy-v2 directory

# Check for Secret named github-app-private-key in yaml files
rg -n "name:\s*github-app-private-key" --type yaml

# Also check for any Secret definitions that might use this name
rg -nB5 "kind:\s*Secret" --type yaml | rg -A5 "github-app-private-key"

Repository: shiftweek/oape-ai-e2e

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Explore repository structure and search for Secret definitions and github-app references

# List files in deploy-v2 directory
echo "=== Files in deploy-v2 directory ==="
fd . deploy-v2 --type f | head -20

# Search for all Secret definitions (any kind)
echo -e "\n=== All Secret definitions ==="
rg "kind:\s*Secret" --type yaml -A 3

# Search for sealed-secret definitions
echo -e "\n=== Sealed Secret definitions ==="
rg "kind:\s*SealedSecret" --type yaml -A 3

# Broader search for github-app references
echo -e "\n=== All github-app references ==="
rg "github-app" --type yaml -n

# Check for secret-configs file mentioned in review
echo -e "\n=== Files matching secret-configs pattern ==="
fd "secret" deploy-v2 --type f

# Search for any file that might define secrets
echo -e "\n=== All yaml files in deploy-v2 ==="
fd "\.ya?ml$" deploy-v2 --type f

Repository: shiftweek/oape-ai-e2e

Length of output: 1769


🏁 Script executed:

# View the secret-configs.yaml file to see complete Secret definitions
cat -n deploy-v2/secret-configs.yaml

Repository: shiftweek/oape-ai-e2e

Length of output: 959


🏁 Script executed:

# View the gh-token-minter-deployment.yaml file to see the full context of the volume mounting
cat -n deploy-v2/gh-token-minter-deployment.yaml

Repository: shiftweek/oape-ai-e2e

Length of output: 2296


Change secretName from github-app-private-key to gh-app-secret.

The Secret named github-app-private-key does not exist in the deployment. The actual Secret containing the GitHub App private key is named gh-app-secret (defined in secret-configs.yaml). Update line 70 to use the correct Secret name.

🧰 Tools
🪛 Checkov (3.2.513)

[medium] 32-70: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 32-70: Minimize the admission of root containers

(CKV_K8S_23)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 67 - 70, Update the
volume's Secret reference: in the deployment's volumes entry for name
"private-key" (the block where secret.secretName is set), change secretName from
"github-app-private-key" to "gh-app-secret" so the volume mounts the actual
Secret defined in secret-configs.yaml.

@swghosh swghosh force-pushed the rearch branch 4 times, most recently from 54de20f to 3892d58 Compare April 8, 2026 20:46
swghosh and others added 5 commits April 9, 2026 02:18
- mv server agent

Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants