Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a v2 deployment: a Go HTTP server (submit/status/SSE logs) that creates Kubernetes Jobs, a Python job worker entrypoint, a GitHub App token minter (script, image, deployment, RBAC), Dockerfiles, Kustomize manifests, and related config/secret manifests and static UI. Changes
Sequence Diagram(s)sequenceDiagram
participant Browser as Browser (User)
participant GoServer as Go Server (HTTP)
participant K8sAPI as Kubernetes API
participant JobPod as Worker Pod (Python)
participant GitHub as GitHub API
Browser->>GoServer: POST /submit (ep_url, repo)
GoServer->>GoServer: validate & generate job_id
GoServer->>K8sAPI: Create Job (env, secrets, volumes)
K8sAPI-->>GoServer: Job created
GoServer-->>Browser: return job_id
Browser->>GoServer: GET /stream/{job_id} (SSE)
GoServer->>K8sAPI: poll for pod & phase
K8sAPI-->>GoServer: pod found / running
GoServer->>K8sAPI: stream pod logs (follow)
K8sAPI-->>GoServer: log lines
GoServer-->>Browser: SSE log/status events
JobPod->>GitHub: use mounted token to perform workflow API calls
JobPod->>JobPod: complete workflow work
K8sAPI-->>GoServer: job final status
GoServer-->>Browser: SSE complete (result)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
🧹 Nitpick comments (1)
deploy/deployment.yaml (1)
96-96: Avoid:latestin the Deployment manifests.CI is already producing sha tags, but these manifests still deploy
:latest. That makes rollouts non-reproducible and allows a reschedule or later apply to pick up a different build than the one this PR was validated against. Use the immutable sha tag or image digest here.Also applies to: 148-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/deployment.yaml` at line 96, Replace the hardcoded image: ghcr.io/shiftweek/oape-ai-e2e-token-service:latest in the Deployment manifest with the immutable CI-produced SHA tag or the image digest (e.g., ghcr.io/shiftweek/oape-ai-e2e-token-service@sha256:...) so rollouts are reproducible; update every occurrence of the image field that currently uses :latest (including the other instance referenced) and, if applicable, wire it to the CI variable or kustomize/Helm value that injects the sha-tag so the manifest always deploys the exact build validated by CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/deployment.yaml`:
- Around line 93-115: Add explicit pod- and container-level securityContext
blocks for the new token-service deployment and the oape-orchestrator deployment
to harden pods: at the pod spec level set securityContext with runAsNonRoot:
true, runAsUser: 1000 (or a non-root UID), fsGroup: 65534, and seccompProfile: {
type: RuntimeDefault }; within the token-service container (name: token-service)
and the oape-orchestrator container set securityContext with
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, privileged:
false, capabilities: { drop: ["ALL"] }, and runAsNonRoot: true; ensure volume
mounts remain writable only where needed and that the secret volume mount
retains readOnly: true.
In `@go-server/config.go`:
- Around line 46-53: The config loading code that parses TTL_AFTER_FINISHED
should reject negative values immediately: after parsing v into n in the TTL
parsing block, check if n < 0 and return an error (e.g., fmt.Errorf("invalid
TTL_AFTER_FINISHED: value must be non-negative")) instead of allowing a negative
ttl to be assigned to ttl (int32); keep the existing strconv.ParseInt call and
the ttl = int32(n) assignment but add the non-negative validation before the
conversion/assignment to fail-fast on bad configuration.
In `@go-server/go.mod`:
- Line 35: Update the vulnerable indirect dependency golang.org/x/oauth2 from
v0.21.0 to v0.27.0+ by adding/updating its entry in go.mod (the existing
"golang.org/x/oauth2" require line) or running "go get
golang.org/x/oauth2@v0.27.0" and then "go mod tidy" to regenerate go.sum; ensure
the updated version appears in go.mod/go.sum and re-run tests/build to confirm
no breakage (target symbol: golang.org/x/oauth2 in go.mod).
In `@go-server/handlers.go`:
- Around line 156-159: The handlers currently treat all errors from
a.k8s.GetJobStatus, a.k8s.GetJobPod and the pod Get call as "not found" or
transient; update these to inspect Kubernetes error types (use
k8s.io/apimachinery/pkg/api/errors functions like apierrors.IsNotFound,
apierrors.IsForbidden, apierrors.IsUnauthorized, etc.) and branch accordingly:
return a 404 via writeError only when apierrors.IsNotFound(err) is true, return
a 4xx (e.g., 403/401) when IsForbidden/IsUnauthorized match, and return a 5xx
(and stop retrying/waiting) for API server/network errors; ensure GetJobStatus,
GetJobPod and the pod Get error handling uses these checks and surfaces
non-transient errors immediately instead of mapping them to "waiting".
- Around line 67-86: The fetchToken method currently calls http.Get without a
context or timeout; change fetchToken to accept a context (e.g., func (a *App)
fetchToken(ctx context.Context) (string, error)) and create the request with
http.NewRequestWithContext(ctx, "GET", a.cfg.TokenServiceURL+"/token", nil) or
use a custom http.Client with a timeout (e.g., &http.Client{Timeout: X}) to
perform the request, propagate context from the caller (r.Context()) into the
call, and keep the existing response handling and error wrapping in fetchToken.
- Around line 264-289: The scanner loop using bufio.NewScanner currently treats
EOF and read errors the same and immediately emits a "complete" event; after the
loop returns, check scanner.Err() and if non-nil emit an "error" or
"stream_error" SSE event (use w and flusher) describing the streaming failure
instead of querying a.k8s.GetJobStatus/jobID for completion, and only fall back
to calling a.k8s.GetJobStatus and sending the "complete" event when
scanner.Err() is nil; ensure you still flush via flusher after sending either
the error or complete event.
In `@go-server/k8s.go`:
- Around line 78-139: The PodSpec currently defined in the corev1.PodSpec
literal (the block with RestartPolicy, Containers, Volumes) should explicitly
disable automatic mounting of the service-account token by adding
automountServiceAccountToken: false to that PodSpec; update the PodSpec
construction (the corev1.PodSpec where RestartPolicy: corev1.RestartPolicyNever
and Containers: []corev1.Container{...}) to include
AutomountServiceAccountToken: pointer to false (i.e., &false) so worker pods do
not receive the namespace service-account token.
- Around line 85-95: The GH_TOKEN is being inlined in the Env slice
(corev1.EnvVar with Name "GH_TOKEN" and Value params.GHToken); instead create a
Kubernetes Secret holding the token and change the EnvVar for "GH_TOKEN" to use
ValueFrom.SecretKeyRef referencing that Secret and its key instead of Value;
update the Job creation code that builds the Env: []corev1.EnvVar to remove
Value usage for GH_TOKEN and use ValueFrom.SecretKeyRef (pointing to the new
secret name and key), and ensure code that provisions the Job also creates or
expects that Secret to exist.
In `@go-server/main.go`:
- Around line 31-32: Replace the direct call to
http.ListenAndServe(cfg.ListenAddr, mux) with an explicit http.Server instance
(set Addr to cfg.ListenAddr and Handler to mux) and configure server timeouts:
set ReadHeaderTimeout and ReadTimeout to reasonable values and set IdleTimeout
to limit idle connections; do not set WriteTimeout so SSE handler
/stream/{job_id} can stream uninterrupted; then call server.ListenAndServe() and
log.Fatal on its error (keeping the existing log.Printf about cfg.ListenAddr).
Use the existing symbols cfg.ListenAddr and mux to locate where to create the
server.
In `@go-server/static/homepage.html`:
- Around line 89-114: The page currently logs /repos failures only to console;
update the loadRepos() flow to write an error message into the DOM element with
id "status" and keep the form submit control disabled until repos load
successfully: in loadRepos(), locate the repoSelect and submit button elements
(e.g., repoSelect and the submit button control used elsewhere), set the submit
button disabled at the start, and on successful fetch populate repoSelect and
enable the submit button; in the catch block set
document.getElementById('status').textContent to a user-friendly error message
(including minimal detail like "Failed to load repositories") and ensure the
submit button remains disabled so users cannot submit until repos load. Ensure
loadRepos() is the function modified and only manipulate `#status`, repoSelect,
and the submit control.
- Around line 79-80: The status and logOutput divs are not announced by
assistive tech; change the elements with id="status" and id="logOutput" to be
ARIA live regions (e.g., add attributes like role="status" or role="log",
aria-live="polite" or "assertive" as appropriate, and aria-atomic="true") so
streamed progress and final results are announced; ensure updates replace or
append plain text (not only child nodes that screen readers ignore) so screen
readers receive each update.
In `@server/main.py`:
- Around line 27-40: Replace the plain string prints with JSON-line events so
the orchestrator always receives structured messages: emit a JSON object for the
start event (e.g.,
{"event":"workflow_start","ep_url":ep_url,"repo":repo,"cwd":working_dir}), for
success emit {"event":"workflow_success","cost_usd":result.cost_usd} and for
each PR emit {"event":"pr_created","pr_url":pr.pr_url}, and for failure emit
{"event":"workflow_failed","error":str(result.error)}; use json.dumps(...,
default=str) and print(..., flush=True) for stdout events and print(...,
file=sys.stderr, flush=True) for failure so on_message and run_workflow remain
unchanged but all lifecycle prints are uniform JSON lines.
- Around line 24-25: The code currently defaults WORKING_DIR to "/workspace" and
calls os.makedirs(working_dir, exist_ok=True) which fails for non-root image
users; change the default to a user-writable path such as "/tmp/workspace" (i.e.
update os.environ.get("WORKING_DIR", "/tmp/workspace")) and/or add a fallback:
attempt to create the directory and if a PermissionError is raised, fall back to
a writable location (e.g. tempfile.mkdtemp or "/tmp/workspace") and log the
switch; refer to the working_dir variable, the os.environ.get call, and the
os.makedirs(working_dir, exist_ok=True) line when making this change.
In `@token-service/github_app.go`:
- Around line 97-132: The code currently picks installations[0] and posts a nil
body to tokenURL (constructed from instID) which can select the wrong
installation or mint an overly-broad token; update the handler that calls this
logic (the function around installations, instID, tokenURL, and
accessTokenResponse) to accept either a specific installation ID or the target
repository/owner (owner/repo) as request parameters, filter the decoded
installations slice by the installation account login or repository list to find
the correct instID (or require the caller to provide instID), and when creating
the POST to tokenURL include a JSON body with the repositories or repository_ids
field to scope the token to only the required repo; ensure headers still include
Authorization/Accept and handle encoding errors and response decoding from
accessTokenResponse as before.
In `@token-service/go.mod`:
- Line 5: The project currently pins github.com/golang-jwt/jwt/v5 at v5.2.1
which is vulnerable; update the dependency reference for
"github.com/golang-jwt/jwt/v5" to v5.2.2 in the go.mod require statement, then
run the normal Go module maintenance commands (go get
github.com/golang-jwt/jwt/v5@v5.2.2 and/or go mod tidy) to update go.sum and
vendor as needed, and run tests/build to verify nothing breaks.
In `@token-service/main.go`:
- Line 48: The endpoint registration using mux.HandleFunc("GET /token",
handleToken(appID, privateKey)) exposes fresh installation tokens to
unauthenticated callers; change this to a non-idempotent verb and add
authentication/authorization and no-store caching: register as POST (e.g.,
mux.HandleFunc("POST /token", ...)), enforce an authz middleware or explicit
caller validation inside handleToken (validate incoming JWT/API key/client
identity before minting), and set response headers to Cache-Control: no-store
(and other security headers) before returning tokens; update related handlers
mentioned around lines 58-74 similarly.
- Around line 54-55: Replace the direct call to http.ListenAndServe (used after
log.Printf("Token service listening on %s", listenAddr)) with an instantiated
http.Server that sets explicit timeouts (e.g., ReadTimeout, WriteTimeout,
IdleTimeout) and assigns Addr = listenAddr and Handler = mux, then call
server.ListenAndServe() (and log.Fatal on error) so the token service is
protected against slow/long-lived connections.
---
Nitpick comments:
In `@deploy/deployment.yaml`:
- Line 96: Replace the hardcoded image:
ghcr.io/shiftweek/oape-ai-e2e-token-service:latest in the Deployment manifest
with the immutable CI-produced SHA tag or the image digest (e.g.,
ghcr.io/shiftweek/oape-ai-e2e-token-service@sha256:...) so rollouts are
reproducible; update every occurrence of the image field that currently uses
:latest (including the other instance referenced) and, if applicable, wire it to
the CI variable or kustomize/Helm value that injects the sha-tag so the manifest
always deploys the exact build validated by CI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c08708e7-a820-4c89-9d46-cbd9f6d512d1
⛔ Files ignored due to path filters (2)
go-server/go.sumis excluded by!**/*.sumtoken-service/go.sumis excluded by!**/*.sum
📒 Files selected for processing (20)
.github/workflows/build-image.yamlDockerfiledeploy/deployment.yamlgo-server/Dockerfilego-server/config.gogo-server/go-servergo-server/go.modgo-server/handlers.gogo-server/k8s.gogo-server/main.gogo-server/static/homepage.htmlserver/homepage.htmlserver/main.pyserver/requirements.txtserver/server.pytoken-service/Dockerfiletoken-service/github_app.gotoken-service/go.modtoken-service/main.gotoken-service/token-service
💤 Files with no reviewable changes (3)
- server/requirements.txt
- server/homepage.html
- server/server.py
deploy/deployment.yaml
Outdated
| spec: | ||
| containers: | ||
| - name: oape-server | ||
| image: ghcr.io/shiftweek/oape-ai-e2e:latest | ||
| - name: token-service | ||
| image: ghcr.io/shiftweek/oape-ai-e2e-token-service:latest | ||
| ports: | ||
| - containerPort: 8000 | ||
| - containerPort: 8081 | ||
| env: | ||
| - name: GITHUB_APP_ID | ||
| valueFrom: | ||
| configMapKeyRef: | ||
| name: oape-orchestrator-config | ||
| key: GITHUB_APP_ID | ||
| - name: GITHUB_APP_PRIVATE_KEY_PATH | ||
| value: /secrets/github-app/private-key.pem | ||
| volumeMounts: | ||
| - name: github-app-key | ||
| mountPath: /secrets/github-app | ||
| readOnly: true | ||
| volumes: | ||
| - name: github-app-key | ||
| secret: | ||
| secretName: github-app-private-key | ||
| --- |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "deploy*" -o -name "deployment*" 2>/dev/null | head -20Repository: shiftweek/oape-ai-e2e
Length of output: 47
🏁 Script executed:
git ls-files | grep -E "(deploy|deployment)" | head -20Repository: shiftweek/oape-ai-e2e
Length of output: 113
🏁 Script executed:
wc -l deploy/deployment.yamlRepository: shiftweek/oape-ai-e2e
Length of output: 91
🏁 Script executed:
cat -n deploy/deployment.yaml | head -170Repository: shiftweek/oape-ai-e2e
Length of output: 5207
Harden the new service pods explicitly.
Both Deployments currently rely on platform defaults, leaving root execution, privilege escalation, and writable root filesystems available. These are HTTP services and should run with restricted security profiles.
Apply pod-level and container-level securityContext to both deployments:
🔒 Suggested hardening pattern
spec:
template:
spec:
+ securityContext:
+ runAsNonRoot: true
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: token-service
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]Apply the same settings to the oape-orchestrator deployment at lines 144-166.
🧰 Tools
🪛 Checkov (3.2.510)
[medium] 78-115: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 78-115: Minimize the admission of root containers
(CKV_K8S_23)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/deployment.yaml` around lines 93 - 115, Add explicit pod- and
container-level securityContext blocks for the new token-service deployment and
the oape-orchestrator deployment to harden pods: at the pod spec level set
securityContext with runAsNonRoot: true, runAsUser: 1000 (or a non-root UID),
fsGroup: 65534, and seccompProfile: { type: RuntimeDefault }; within the
token-service container (name: token-service) and the oape-orchestrator
container set securityContext with allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, privileged: false, capabilities: { drop: ["ALL"]
}, and runAsNonRoot: true; ensure volume mounts remain writable only where
needed and that the secret volume mount retains readOnly: true.
go-server/config.go
Outdated
| ttl := int32(3600) | ||
| if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" { | ||
| n, err := strconv.ParseInt(v, 10, 32) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err) | ||
| } | ||
| ttl = int32(n) | ||
| } |
There was a problem hiding this comment.
Validate TTL_AFTER_FINISHED is non-negative.
Negative values parse successfully here and only fail later when the Job is created. Reject them during config loading so the service fails fast on bad configuration.
🛡️ Suggested validation
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
+ if n < 0 {
+ return nil, fmt.Errorf("TTL_AFTER_FINISHED must be >= 0")
+ }
ttl = int32(n)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ttl := int32(3600) | |
| if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" { | |
| n, err := strconv.ParseInt(v, 10, 32) | |
| if err != nil { | |
| return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err) | |
| } | |
| ttl = int32(n) | |
| } | |
| ttl := int32(3600) | |
| if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" { | |
| n, err := strconv.ParseInt(v, 10, 32) | |
| if err != nil { | |
| return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err) | |
| } | |
| if n < 0 { | |
| return nil, fmt.Errorf("TTL_AFTER_FINISHED must be >= 0") | |
| } | |
| ttl = int32(n) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-server/config.go` around lines 46 - 53, The config loading code that
parses TTL_AFTER_FINISHED should reject negative values immediately: after
parsing v into n in the TTL parsing block, check if n < 0 and return an error
(e.g., fmt.Errorf("invalid TTL_AFTER_FINISHED: value must be non-negative"))
instead of allowing a negative ttl to be assigned to ttl (int32); keep the
existing strconv.ParseInt call and the ttl = int32(n) assignment but add the
non-negative validation before the conversion/assignment to fail-fast on bad
configuration.
| github.com/spf13/pflag v1.0.5 // indirect | ||
| github.com/x448/float16 v0.8.4 // indirect | ||
| golang.org/x/net v0.26.0 // indirect | ||
| golang.org/x/oauth2 v0.21.0 // indirect |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the go-server/go.mod file to confirm the current dependency state
find . -name "go.mod" -type f | head -20Repository: shiftweek/oape-ai-e2e
Length of output: 106
🏁 Script executed:
# Check if go-server directory exists and examine its go.mod
if [ -f "go-server/go.mod" ]; then
echo "=== go-server/go.mod content ==="
cat go-server/go.mod
fiRepository: shiftweek/oape-ai-e2e
Length of output: 2119
🌐 Web query:
GO-2025-3488 GHSA-6v2p-p543-phr9 golang.org/x/oauth2 vulnerability
💡 Result:
GO-2025-3488 (GHSA-6v2p-p543-phr9, CVE-2025-22868) is a vulnerability in golang.org/x/oauth2/jws causing unexpected memory consumption during parsing of malicious malformed tokens, enabling a denial-of-service attack. It affects versions before v0.27.0 of golang.org/x/oauth2. Fixed in v0.27.0. Published February 26, 2025. Credits to jub0bs. References include Go issue 71490 and CL 652155. Severity: High (CVSS 7.5, impact on availability).
Citations:
- 1: https://pkg.go.dev/vuln/GO-2025-3488
- 2: https://osv.dev/GO-2025-3488
- 3: https://osv.dev/vulnerability/GO-2025-3488
- 4: x/vulndb: potential Go vuln in golang.org/x/oauth2: GHSA-6v2p-p543-phr9 golang/vulndb#3813
- 5: https://osv.dev/vulnerability/GHSA-6v2p-p543-phr9
Upgrade golang.org/x/oauth2 to v0.27.0 or later.
The module graph includes golang.org/x/oauth2 v0.21.0 (via indirect dependency), which is vulnerable to GO-2025-3488 / GHSA-6v2p-p543-phr9. This DoS vulnerability in the jws package affects all versions before v0.27.0. Since k8s.io/client-go can use oauth2-backed kubeconfig authentication flows, upgrade to v0.27.0+ before release.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-server/go.mod` at line 35, Update the vulnerable indirect dependency
golang.org/x/oauth2 from v0.21.0 to v0.27.0+ by adding/updating its entry in
go.mod (the existing "golang.org/x/oauth2" require line) or running "go get
golang.org/x/oauth2@v0.27.0" and then "go mod tidy" to regenerate go.sum; ensure
the updated version appears in go.mod/go.sum and re-run tests/build to confirm
no breakage (target symbol: golang.org/x/oauth2 in go.mod).
go-server/handlers.go
Outdated
| // fetchToken calls the Token Service to get an ephemeral GitHub token. | ||
| func (a *App) fetchToken() (string, error) { | ||
| resp, err := http.Get(a.cfg.TokenServiceURL + "/token") | ||
| if err != nil { | ||
| return "", fmt.Errorf("calling token service: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| body, _ := io.ReadAll(resp.Body) | ||
| return "", fmt.Errorf("token service returned %d: %s", resp.StatusCode, body) | ||
| } | ||
|
|
||
| var result struct { | ||
| Token string `json:"token"` | ||
| } | ||
| if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { | ||
| return "", fmt.Errorf("decoding token response: %w", err) | ||
| } | ||
| return result.Token, nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the full handlers.go file to understand the context
cat -n go-server/handlers.goRepository: shiftweek/oape-ai-e2e
Length of output: 9323
🌐 Web query:
Effective Go HTTP timeout best practices site:go.dev
💡 Result:
Effective Go HTTP timeout best practices focus on configuring timeouts for both clients and servers to prevent hanging connections, resource exhaustion, and slow responses. Always create custom http.Client and http.Server instances instead of using defaults, as DefaultClient has no timeout (zero value means infinite). Client timeouts (via http.Client.Timeout): Sets an overall limit including connect, redirects, and body read. Recommended: 5-30 seconds depending on service SLA. The timer continues after Do returns, canceling body reads. Use context.WithTimeout for per-request control. Transport timeouts (via http.Transport): - DialContext.Timeout (default 30s): Connection establishment. - TLSHandshakeTimeout (default 10s): TLS negotiation. - ResponseHeaderTimeout: Time after request write to receive headers. - ExpectContinueTimeout (default 1s): For Expect: 100-continue. - IdleConnTimeout (default 90s): Close idle connections. - Client.Timeout covers the full request if set. Example client: client := &http.Client{ Timeout: 30 * time.Second, Transport: &http.Transport{ DialContext: (&net.Dialer{Timeout: 30 * time.Second}).DialContext, TLSHandshakeTimeout: 10 * time.Second, ResponseHeaderTimeout: 30 * time.Second, ExpectContinueTimeout: 1 * time.Second, IdleConnTimeout: 90 * time.Second, }, } Server timeouts (via http.Server): - ReadTimeout (ex 10s): Max time to read request (headers + body). - WriteTimeout (ex 10s): Max time to write response. - ReadHeaderTimeout: Time to read headers only. Example server: server := &http.Server{ Addr: ":8080", ReadTimeout: 10 * time.Second, ReadHeaderTimeout: 5 * time.Second, WriteTimeout: 10 * time.Second, } Additional practices: - Reuse Clients/Transports for connection pooling; do not create per-request. - Use contexts (context.WithTimeout) for fine-grained cancellation, propagated to requests. - For handlers, wrap with http.TimeoutHandler. - Tune based on p99 latencies + buffer (e.g., ReadTimeout > p95 request time). - Avoid zero timeouts (infinite); always set explicit values. - Monitor for goroutine leaks from slow upstreams (issues like #31657 fixed in recent Go). These prevent DoS from slow clients/servers and ensure graceful degradation.
Citations:
- 1: https://pkg.go.dev/net/http@go1.25.6
- 2: https://go.dev/src/net/http/transport.go
- 3: https://pkg.go.dev/net/http@go1.22.4
- 4: https://pkg.go.dev/net/http@go1.24.2
- 5: https://go.dev/blog/context/
Add context and timeout to the token-service call.
http.Get() uses the default client with no deadline. If the token service hangs or the network is slow, the /submit endpoint will block indefinitely. Pass r.Context() to fetchToken() and use http.NewRequestWithContext() or a custom client with an explicit timeout, as recommended by Effective Go.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-server/handlers.go` around lines 67 - 86, The fetchToken method currently
calls http.Get without a context or timeout; change fetchToken to accept a
context (e.g., func (a *App) fetchToken(ctx context.Context) (string, error))
and create the request with http.NewRequestWithContext(ctx, "GET",
a.cfg.TokenServiceURL+"/token", nil) or use a custom http.Client with a timeout
(e.g., &http.Client{Timeout: X}) to perform the request, propagate context from
the caller (r.Context()) into the call, and keep the existing response handling
and error wrapping in fetchToken.
| status, err := a.k8s.GetJobStatus(r.Context(), jobID) | ||
| if err != nil { | ||
| writeError(w, http.StatusNotFound, fmt.Sprintf("job not found: %s", jobID)) | ||
| return |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd go-server && find . -name "handlers.go" | head -5Repository: shiftweek/oape-ai-e2e
Length of output: 78
🏁 Script executed:
wc -l go-server/handlers.goRepository: shiftweek/oape-ai-e2e
Length of output: 90
🏁 Script executed:
cat -n go-server/handlers.go | sed -n '150,165p'Repository: shiftweek/oape-ai-e2e
Length of output: 551
🏁 Script executed:
cat -n go-server/handlers.go | sed -n '200,220p'Repository: shiftweek/oape-ai-e2e
Length of output: 556
🏁 Script executed:
cat -n go-server/handlers.go | sed -n '225,240p'Repository: shiftweek/oape-ai-e2e
Length of output: 598
🏁 Script executed:
# Search for GetJobStatus and GetJobPod implementations
rg "func.*GetJobStatus|func.*GetJobPod" go-server/Repository: shiftweek/oape-ai-e2e
Length of output: 271
🏁 Script executed:
# Check for error type checking patterns (apierrors.IsNotFound, etc.)
rg "IsNotFound|apierrors|StatusReason" go-server/handlers.goRepository: shiftweek/oape-ai-e2e
Length of output: 47
🏁 Script executed:
cat -n go-server/k8s.go | grep -A 20 "func (c \*K8sClient) GetJobStatus"Repository: shiftweek/oape-ai-e2e
Length of output: 959
🏁 Script executed:
cat -n go-server/k8s.go | grep -A 20 "func (c \*K8sClient) GetJobPod"Repository: shiftweek/oape-ai-e2e
Length of output: 974
🏁 Script executed:
# Check imports in handlers.go and k8s.go for Kubernetes error handling
rg "^import|apierrors" go-server/handlers.go go-server/k8s.goRepository: shiftweek/oape-ai-e2e
Length of output: 121
🏁 Script executed:
# Check the full k8s.go file to understand structure
wc -l go-server/k8s.go && head -50 go-server/k8s.goRepository: shiftweek/oape-ai-e2e
Length of output: 1410
Distinguish Kubernetes error types instead of collapsing all errors into "not found" or "waiting".
The code treats all GetJobStatus failures as 404 responses and all GetJobPod/Pod Get failures as temporary states. This masks real failures like RBAC denials, API server outages, and network errors. Use the error types from the Kubernetes client library (e.g., apierrors.IsNotFound()) to distinguish genuine "not found" cases from other errors, and surface non-transient failures immediately instead of retrying indefinitely.
Applies to lines 156–159, 203–217, and 228–235.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-server/handlers.go` around lines 156 - 159, The handlers currently treat
all errors from a.k8s.GetJobStatus, a.k8s.GetJobPod and the pod Get call as "not
found" or transient; update these to inspect Kubernetes error types (use
k8s.io/apimachinery/pkg/api/errors functions like apierrors.IsNotFound,
apierrors.IsForbidden, apierrors.IsUnauthorized, etc.) and branch accordingly:
return a 404 via writeError only when apierrors.IsNotFound(err) is true, return
a 4xx (e.g., 403/401) when IsForbidden/IsUnauthorized match, and return a 5xx
(and stop retrying/waiting) for API server/network errors; ensure GetJobStatus,
GetJobPod and the pod Get error handling uses these checks and surfaces
non-transient errors immediately instead of mapping them to "waiting".
server/main.py
Outdated
| print(f"Starting workflow: ep_url={ep_url} repo={repo} cwd={working_dir}", flush=True) | ||
|
|
||
| def on_message(msg): | ||
| print(json.dumps(msg, default=str), flush=True) | ||
|
|
||
| result = await run_workflow(ep_url, repo, working_dir, on_message=on_message) | ||
|
|
||
| if result.success: | ||
| print(f"WORKFLOW_SUCCESS cost=${result.cost_usd:.4f}", flush=True) | ||
| for pr in result.prs: | ||
| print(f"PR_CREATED: {pr.pr_url}", flush=True) | ||
| sys.exit(0) | ||
| else: | ||
| print(f"WORKFLOW_FAILED: {result.error}", file=sys.stderr, flush=True) |
There was a problem hiding this comment.
Keep the worker log stream uniformly structured.
Lines 27, 35, 37, and 40 emit plain strings even though this module’s contract says the orchestrator consumes JSON lines. That makes the stream format non-uniform and forces downstream consumers to special-case lifecycle events.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/main.py` around lines 27 - 40, Replace the plain string prints with
JSON-line events so the orchestrator always receives structured messages: emit a
JSON object for the start event (e.g.,
{"event":"workflow_start","ep_url":ep_url,"repo":repo,"cwd":working_dir}), for
success emit {"event":"workflow_success","cost_usd":result.cost_usd} and for
each PR emit {"event":"pr_created","pr_url":pr.pr_url}, and for failure emit
{"event":"workflow_failed","error":str(result.error)}; use json.dumps(...,
default=str) and print(..., flush=True) for stdout events and print(...,
file=sys.stderr, flush=True) for failure so on_message and run_workflow remain
unchanged but all lifecycle prints are uniform JSON lines.
token-service/github_app.go
Outdated
| var installations []installation | ||
| if err := json.NewDecoder(resp.Body).Decode(&installations); err != nil { | ||
| return "", "", fmt.Errorf("decoding installations: %w", err) | ||
| } | ||
| if len(installations) == 0 { | ||
| return "", "", fmt.Errorf("no installations found for app %d", appID) | ||
| } | ||
|
|
||
| instID := installations[0].ID | ||
|
|
||
| // Create access token. | ||
| tokenURL := fmt.Sprintf("https://api.github.com/app/installations/%d/access_tokens", instID) | ||
| req, err = http.NewRequest("POST", tokenURL, nil) | ||
| if err != nil { | ||
| return "", "", err | ||
| } | ||
| req.Header.Set("Authorization", "Bearer "+jwtToken) | ||
| req.Header.Set("Accept", "application/vnd.github+json") | ||
|
|
||
| resp, err = client.Do(req) | ||
| if err != nil { | ||
| return "", "", fmt.Errorf("creating access token: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusCreated { | ||
| body, _ := io.ReadAll(resp.Body) | ||
| return "", "", fmt.Errorf("create access token returned %d: %s", resp.StatusCode, body) | ||
| } | ||
|
|
||
| var tokenResp accessTokenResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&tokenResp); err != nil { | ||
| return "", "", fmt.Errorf("decoding access token: %w", err) | ||
| } | ||
|
|
||
| return tokenResp.Token, tokenResp.ExpiresAt, nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n token-service/github_app.go | head -150Repository: shiftweek/oape-ai-e2e
Length of output: 4565
🏁 Script executed:
git ls-files token-service/ | head -20Repository: shiftweek/oape-ai-e2e
Length of output: 209
🏁 Script executed:
rg "getInstallationToken" --type go -B 2 -A 2Repository: shiftweek/oape-ai-e2e
Length of output: 822
🏁 Script executed:
cat -n token-service/main.goRepository: shiftweek/oape-ai-e2e
Length of output: 2422
🌐 Web query:
GitHub App access token repository scoping API documentation
💡 Result:
GitHub Apps support repository scoping for their access tokens via the REST API endpoint POST /app/installations/{installation_id}/access_tokens. This generates an installation access token (IAT), which by default has access to all repositories the app installation can access. To scope it to specific repositories, include the 'repositories' (array of repository names) or 'repository_ids' (array of repository IDs) in the request body. Up to 500 repositories can be specified. The token cannot access repositories outside the installation's granted access. You must authenticate the request with a JWT generated for the GitHub App. The IAT expires in 1 hour. Example request: curl -X POST \ -H "Authorization: Bearer YOUR_JWT" \ -H "Accept: application/vnd.github+json" \ -H "X-GitHub-Api-Version: 2026-03-10" \ https://api.github.com/app/installations/{installation_id}/access_tokens \ -d '{ "repositories": ["repo1/repoA", "repo2/repoB"], "permissions": {"contents": "read"} }' User access tokens (generated via OAuth flow for GitHub Apps) inherit scoping from user and app permissions but do not use traditional scopes; they use fine-grained permissions and are limited to repositories both the user and app can access. They can also be further restricted with 'repository_id' parameter in the OAuth request. Official documentation is in GitHub Docs under "Generating an installation access token for a GitHub App" and REST API endpoints for Apps.
Citations:
- 1: https://docs.github.com/rest/apps/apps
- 2: https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/generating-an-installation-access-token-for-a-github-app
- 3: https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/authenticating-as-a-github-app-installation
- 4: https://github.apidog.io/api-3488868
Resolve the installation explicitly and scope the token.
installations[0] makes token selection depend on GitHub's installation ordering, and the nil body in the access-token request grants the worker access to all repositories the app installation can reach. GitHub's API returns installations in no guaranteed order, and once the app is installed in multiple places, this approach risks selecting the wrong installation or minting an overly broad token. The GitHub API supports repository scoping via the request body (using repositories or repository_ids fields)—this should be used to restrict tokens to the specific repository needed for each workflow job. Update the handler to accept the target repository/owner as request parameters, filter installations appropriately (or require the installation ID), and include repository scoping in the access-token request.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@token-service/github_app.go` around lines 97 - 132, The code currently picks
installations[0] and posts a nil body to tokenURL (constructed from instID)
which can select the wrong installation or mint an overly-broad token; update
the handler that calls this logic (the function around installations, instID,
tokenURL, and accessTokenResponse) to accept either a specific installation ID
or the target repository/owner (owner/repo) as request parameters, filter the
decoded installations slice by the installation account login or repository list
to find the correct instID (or require the caller to provide instID), and when
creating the POST to tokenURL include a JSON body with the repositories or
repository_ids field to scope the token to only the required repo; ensure
headers still include Authorization/Accept and handle encoding errors and
response decoding from accessTokenResponse as before.
token-service/go.mod
Outdated
|
|
||
| go 1.23 | ||
|
|
||
| require github.com/golang-jwt/jwt/v5 v5.2.1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the go.mod file exists and examine its content
find . -name "go.mod" -path "*/token-service/*" -type fRepository: shiftweek/oape-ai-e2e
Length of output: 87
🌐 Web query:
GO-2025-3553 GHSA-mh63-6h87-95cp github.com/golang-jwt/jwt vulnerability
💡 Result:
GO-2025-3553 (GHSA-mh63-6h87-95cp, CVE-2025-30204) is a high-severity vulnerability in github.com/golang-jwt/jwt causing excessive memory allocation during header parsing in Parser.ParseUnverified. A malicious Authorization header with many periods (e.g., "Bearer ........") triggers O(n) allocations via strings.Split on untrusted input (CWE-405). Affected versions: - github.com/golang-jwt/jwt/v4: before v4.5.2 - github.com/golang-jwt/jwt/v5: from v5.0.0-rc.1 before v5.2.2 - github.com/golang-jwt/jwt (v3 branch): >=3.2.0 <=3.2.2, no patch available Patched in: - v4.5.2 - v5.2.2 CVSS v3.1: 7.5 High (AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H) CVSS v4.0: 8.7 High Published March 2025. Fix commits: golang-jwt/jwt@0951d18 (v4), similar for v5. Mitigation: Upgrade to patched versions. Avoid ParseUnverified on untrusted input or limit header size.
Citations:
- 1: https://pkg.go.dev/vuln/GO-2025-3553
- 2: https://osv.dev/vulnerability/GHSA-mh63-6h87-95cp
- 3: GHSA-mh63-6h87-95cp
- 4: GHSA-mh63-6h87-95cp
🏁 Script executed:
# Check the exact content of token-service/go.mod to confirm line 5
cat -n ./token-service/go.mod | head -10Repository: shiftweek/oape-ai-e2e
Length of output: 207
Bump github.com/golang-jwt/jwt/v5 to v5.2.2 before release.
v5.2.1 is vulnerable to GO-2025-3553 (GHSA-mh63-6h87-95cp, CVE-2025-30204), a high-severity vulnerability (CVSS 7.5) causing excessive memory allocation during JWT header parsing on untrusted input. Patched in v5.2.2. This is a direct dependency and must not ship with a known-vulnerable version.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@token-service/go.mod` at line 5, The project currently pins
github.com/golang-jwt/jwt/v5 at v5.2.1 which is vulnerable; update the
dependency reference for "github.com/golang-jwt/jwt/v5" to v5.2.2 in the go.mod
require statement, then run the normal Go module maintenance commands (go get
github.com/golang-jwt/jwt/v5@v5.2.2 and/or go mod tidy) to update go.sum and
vendor as needed, and run tests/build to verify nothing breaks.
token-service/main.go
Outdated
| } | ||
|
|
||
| mux := http.NewServeMux() | ||
| mux.HandleFunc("GET /token", handleToken(appID, privateKey)) |
There was a problem hiding this comment.
Do not mint GitHub credentials from a cacheable unauthenticated GET.
GET /token currently returns a fresh installation token to any caller that can reach this service, and the response is sent without Cache-Control: no-store. For a credential-minting endpoint, please add caller authz or strict network isolation at minimum; switching this to POST would also be safer.
Also applies to: 58-74
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@token-service/main.go` at line 48, The endpoint registration using
mux.HandleFunc("GET /token", handleToken(appID, privateKey)) exposes fresh
installation tokens to unauthenticated callers; change this to a non-idempotent
verb and add authentication/authorization and no-store caching: register as POST
(e.g., mux.HandleFunc("POST /token", ...)), enforce an authz middleware or
explicit caller validation inside handleToken (validate incoming JWT/API
key/client identity before minting), and set response headers to Cache-Control:
no-store (and other security headers) before returning tokens; update related
handlers mentioned around lines 58-74 similarly.
token-service/main.go
Outdated
| log.Printf("Token service listening on %s", listenAddr) | ||
| log.Fatal(http.ListenAndServe(listenAddr, mux)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "main.go" -path "*/token-service/*" | head -5Repository: shiftweek/oape-ai-e2e
Length of output: 88
🏁 Script executed:
cat -n token-service/main.go | head -70Repository: shiftweek/oape-ai-e2e
Length of output: 2331
🏁 Script executed:
rg -i "github" token-service/main.go -A 2 -B 2Repository: shiftweek/oape-ai-e2e
Length of output: 611
🏁 Script executed:
rg -i "private.?key\|privatekey" token-service/main.go -A 2 -B 2Repository: shiftweek/oape-ai-e2e
Length of output: 47
🌐 Web query:
Effective Go server timeouts http.Server security best practices
💡 Result:
Effective Go http.Server timeouts and security best practices focus on preventing resource exhaustion from slow or malicious clients (e.g., Slowloris attacks). Always use an explicit http.Server instance instead of http.ListenAndServe, which has zero timeouts by default. Key http.Server timeout fields: - ReadTimeout: Max duration for reading the entire request (headers + body). Default 0 (no limit). Recommended: 5-10s. - ReadHeaderTimeout: Max time to read request headers. Default 0. Recommended: 2-5s. Use for slow header attacks. - WriteTimeout: Max duration before timing out response writes. Reset after headers. Default 0. Recommended: 10-30s. Includes TLS handshake time for HTTPS. - IdleTimeout: Max time to wait for next request on keep-alive connections. Default 0. Recommended: 60-120s. Example configuration: srv := &http.Server{ Addr: ":8080", Handler: router, ReadTimeout: 10 * time.Second, ReadHeaderTimeout: 5 * time.Second, WriteTimeout: 30 * time.Second, IdleTimeout: 120 * time.Second, MaxHeaderBytes: 1 << 20, // 1MB default } log.Fatal(srv.ListenAndServe) Additional security practices: - Wrap handlers with http.TimeoutHandler for per-handler timeouts (returns 503 on exceed). - Use request context (r.Context) in handlers to propagate cancellations. - Set MaxHeaderBytes to limit header size. - Enable HTTPS with proper TLS config. - For production, implement graceful shutdown with srv.Shutdown(ctx). These prevent goroutine/file descriptor leaks. Tune based on p99 request latency + buffer (e.g., handler timeout T, then Read/Write ~1.5T). Test with tools like httptimeout or slowloris.
Citations:
- 1: https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/
- 2: https://adam-p.ca/blog/2022/01/golang-http-server-timeouts/
- 3: https://pkg.go.dev/net/http@go1.19
- 4: https://docs.securesauce.dev/rules/GO007
- 5: https://pkg.go.dev/net/http@go1.22.0
- 6: https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts
- 7: https://ieftimov.com/posts/make-resilient-golang-net-http-servers-using-timeouts-deadlines-context-cancellation/
Add server timeouts to the token service.
Line 55 uses http.ListenAndServe, which provides zero read/write/idle timeouts. This exposes the service to resource exhaustion attacks (e.g., Slowloris) on a system handling the GitHub App private key. Replace with an explicit http.Server struct with configured timeouts:
Suggested fix
import (
"crypto/rsa"
"encoding/json"
"log"
"net/http"
"os"
"strconv"
+ "time"
)
@@
- log.Printf("Token service listening on %s", listenAddr)
- log.Fatal(http.ListenAndServe(listenAddr, mux))
+ srv := &http.Server{
+ Addr: listenAddr,
+ Handler: mux,
+ ReadHeaderTimeout: 5 * time.Second,
+ ReadTimeout: 15 * time.Second,
+ WriteTimeout: 30 * time.Second,
+ IdleTimeout: 60 * time.Second,
+ }
+
+ log.Printf("Token service listening on %s", listenAddr)
+ log.Fatal(srv.ListenAndServe())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log.Printf("Token service listening on %s", listenAddr) | |
| log.Fatal(http.ListenAndServe(listenAddr, mux)) | |
| import ( | |
| "crypto/rsa" | |
| "encoding/json" | |
| "log" | |
| "net/http" | |
| "os" | |
| "strconv" | |
| "time" | |
| ) | |
| // ... (other code) | |
| srv := &http.Server{ | |
| Addr: listenAddr, | |
| Handler: mux, | |
| ReadHeaderTimeout: 5 * time.Second, | |
| ReadTimeout: 15 * time.Second, | |
| WriteTimeout: 30 * time.Second, | |
| IdleTimeout: 60 * time.Second, | |
| } | |
| log.Printf("Token service listening on %s", listenAddr) | |
| log.Fatal(srv.ListenAndServe()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@token-service/main.go` around lines 54 - 55, Replace the direct call to
http.ListenAndServe (used after log.Printf("Token service listening on %s",
listenAddr)) with an instantiated http.Server that sets explicit timeouts (e.g.,
ReadTimeout, WriteTimeout, IdleTimeout) and assigns Addr = listenAddr and
Handler = mux, then call server.ListenAndServe() (and log.Fatal on error) so the
token service is protected against slow/long-lived connections.
d341719 to
41bdbff
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
♻️ Duplicate comments (3)
go-server-v2/handlers.go (2)
238-258:⚠️ Potential issue | 🟠 MajorCheck
scanner.Err()before emittingcomplete.
Scan()returnsfalsefor both EOF and read failures. Without anErr()check, a broken log stream falls through to the final status lookup and can close the EventSource as if the job completed normally.Possible fix
for scanner.Scan() { select { case <-ctx.Done(): return default: } line := scanner.Text() fmt.Fprintf(w, "event: log\ndata: %s\n\n", line) flusher.Flush() } + + if err := scanner.Err(); err != nil { + log.Printf("ERROR: scanning logs for pod %s: %v", podName, err) + fmt.Fprintf(w, "event: error\ndata: %s\n\n", + `{"status":"unknown","message":"log stream interrupted"}`) + flusher.Flush() + return + } // Log stream ended — get final job status.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-server-v2/handlers.go` around lines 238 - 258, After the scanner loop, check scanner.Err() and if non-nil treat the stream as failed: write an SSE error event to w (e.g., "event: error" or "event: complete" with status unknown and include scanner.Err().Error() in the message), flush via flusher.Flush(), and return immediately so you do not call a.k8s.GetJobStatus(ctx, jobID); this ensures broken/read-failure streams are not mistaken for normal completion.
126-129:⚠️ Potential issue | 🟠 MajorPreserve real Kubernetes failures instead of masking them as missing resources.
These branches still collapse every
GetJobStatus,GetJobPod, and podGetfailure into either404or"waiting_for_pod". RBAC denials, expired credentials, and apiserver outages will therefore look identical to ordinary lifecycle states, and the stream can spin until timeout instead of surfacing the real failure. Please branch on Kubernetes error classes here (or haveK8sClientreturn typed/sentinel errors for the transient “pod not created yet” case) so only genuine not-found/pending states are retried.Also applies to: 173-177, 198-204
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-server-v2/handlers.go` around lines 126 - 129, The current handlers collapse all a.k8s.GetJobStatus/GetJobPod/pod Get errors into 404/"waiting_for_pod"; change them to inspect Kubernetes error classes (e.g., use k8s.io/apimachinery/pkg/api/errors helpers or have K8sClient return sentinel typed errors) so that only apierrors.IsNotFound (or your explicit "pod not created yet" sentinel) is treated as not-found/pending and retried; for RBAC/credential/apiserver errors (IsUnauthorized, IsForbidden, IsInvalid, IsServerTimeout, etc.) call writeError with an appropriate HTTP status and the real error message instead of masking it. Locate the logic around a.k8s.GetJobStatus, a.k8s.GetJobPod, and the pod Get branches and replace the unconditional writeError/return with branching that surfaces non-NotFound/critical k8s errors while still allowing NotFound/pending to be handled as before.go-server-v2/k8s.go (1)
78-147:⚠️ Potential issue | 🟠 MajorDo not hand worker pods a service-account token they never use.
Nothing in this PodSpec or
WorkflowParamsrelies on in-cluster API access, but the Job will still get the namespace service-account token by default. That unnecessarily exposes cluster credentials to the agent container alongside the other mounted secrets.Possible fix
func (c *K8sClient) CreateWorkflowJob(ctx context.Context, jobID string, params WorkflowParams) error { jobName := "oape-workflow-" + jobID backoffLimit := int32(0) ttl := params.TTLAfterFinished + automountSAToken := false job := &batchv1.Job{ ObjectMeta: metav1.ObjectMeta{ Name: jobName, @@ }, Spec: corev1.PodSpec{ + AutomountServiceAccountToken: &automountSAToken, RestartPolicy: corev1.RestartPolicyNever, Containers: []corev1.Container{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-server-v2/k8s.go` around lines 78 - 147, PodSpec is currently allowing the default automounting of the namespace service-account token; update the corev1.PodSpec for the Job pod (the PodSpec that contains the "worker" container) to set AutomountServiceAccountToken: pointer to false (AutomountServiceAccountToken: ptr.Bool(false) or &[]bool{false}[0] per your helpers) so the service-account token is not mounted into the container; modify the PodSpec in the same block that defines RestartPolicy, Containers and Volumes to include this field and ensure no code relies on in-cluster credentials.
🧹 Nitpick comments (4)
images-v2/job.Dockerfile (1)
48-50: Use exec form forCMDand verifyEXPOSEis needed.The shell form
CMD python3.11 main.pywon't propagate signals (SIGTERM) to the Python process, which can cause graceful shutdown issues. Also,EXPOSE 8000seems unnecessary for a batch job container.🔧 Proposed fix
-EXPOSE 8000 - -CMD python3.11 main.py +CMD ["python3.11", "main.py"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images-v2/job.Dockerfile` around lines 48 - 50, Replace the shell-form CMD with the exec JSON array form so signals propagate to the Python process (change CMD python3.11 main.py to CMD ["python3.11","main.py"] in the Dockerfile), and review whether EXPOSE 8000 is necessary for this batch job container — if the process does not listen on port 8000 remove the EXPOSE line; if it does, keep it but ensure it’s documented why the port is exposed.images-v2/go-server.Dockerfile (1)
10-13: Add a non-root user in the runtime stage.The runtime image runs as root by default. For better security posture, run as a non-privileged user.
🔒 Proposed fix
FROM registry.access.redhat.com/ubi9-minimal:latest +RUN microdnf install -y shadow-utils && \ + useradd -r -u 1001 -g 0 appuser && \ + microdnf clean all COPY --from=builder /tmp/app /usr/local/bin/app-server EXPOSE 8080 +USER 1001 ENTRYPOINT ["/usr/local/bin/app-server"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images-v2/go-server.Dockerfile` around lines 10 - 13, The runtime stage of the Dockerfile runs the container as root; create a non-privileged user and switch to it in the runtime stage: add a user/group (e.g., appuser), chown the copied binary (/usr/local/bin/app-server) and any needed dirs to that user, and set USER to that non-root account before ENTRYPOINT so the container runs as the non-root user instead of root (reference the COPY --from=builder /tmp/app /usr/local/bin/app-server, EXPOSE 8080, and ENTRYPOINT ["/usr/local/bin/app-server"] lines).server/gh-credential-helper-v2.sh (1)
6-11: Consider validating that the token is non-empty.If the token file is missing or empty, the script silently outputs an empty password, leading to confusing authentication failures. Adding validation would improve debuggability.
🛡️ Suggested improvement
TOKEN=$(cat /etc/github-app-token/token 2>/dev/null) +if [ -z "$TOKEN" ]; then + echo "error: GitHub token not found or empty" >&2 + exit 1 +fi echo "protocol=https"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/gh-credential-helper-v2.sh` around lines 6 - 11, The script currently reads the GitHub token into TOKEN and echoes it as the password without validating it; update the logic around the TOKEN variable to check that TOKEN is non-empty (after reading from /etc/github-app-token/token) and if empty print a clear error message to stderr (e.g., "Missing GitHub app token in /etc/github-app-token/token") and exit non-zero instead of emitting an empty password; keep the existing echo lines (protocol, host, username, password) but only print the password when TOKEN is present so callers don’t receive an empty credential.deploy-v2/gh-token-minter-deployment.yaml (1)
59-74: Add container security context to harden the deployment.The container lacks security context settings, which static analysis flagged. Add settings to prevent privilege escalation and enforce read-only root filesystem where possible.
🔒 Proposed security hardening
containers: - name: ghpat image: github-token-minter:latest + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + capabilities: + drop: + - ALL env:Note:
readOnlyRootFilesystem: truemay require additionalemptyDirmounts for any directories the script needs to write to.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 59 - 74, The container "ghpat" in the Deployment spec currently has no securityContext; update the "ghpat" container spec for image "github-token-minter:latest" to add a securityContext that disables privilege escalation (allowPrivilegeEscalation: false), drops capabilities (capabilities.drop: ["ALL"]), runs as a non-root user (runAsNonRoot: true and/or runAsUser: a non-root UID), enforces a read-only root filesystem (readOnlyRootFilesystem: true) and optionally sets a seccompProfile (type: RuntimeDefault); ensure any writable paths required by the pod are mounted as writable volumes (e.g., emptyDir) so readOnlyRootFilesystem does not break startup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 80-82: The volume is referencing secretName:
github-app-private-key but the Secret resource in this manifest is named
gh-app-secret (metadata.name), causing pod startup failure; fix by making these
names consistent—either rename the Secret's metadata.name to
github-app-private-key or update the volume's secretName to gh-app-secret (the
volume entry with name: private-key and its secret.secretName field).
In `@deploy-v2/go-server-deployment.yaml`:
- Around line 82-85: The Deployment mounts the oape-configs ConfigMap but that
ConfigMap is not being generated, causing LoadConfig() in go-server-v2/config.go
to fail when reading /config/team-repos.csv; fix by uncommenting and enabling
the configMapGenerator block for oape-configs in deploy-v2/kustomization.yaml so
the ConfigMap (oape-configs) is created and contains team-repos.csv prior to
deployment.
- Around line 68-81: Add a restrictive securityContext for the oape-go-server
container and a pod-level security context: set podSecurityContext with
runAsNonRoot:true and an fsGroup (e.g. 1000) to ensure ConfigMap files are
readable, and in the container securityContext for the container named
"oape-go-server" set runAsUser to a non-root UID (e.g. 1000),
readOnlyRootFilesystem:true, allowPrivilegeEscalation:false, drop all
capabilities (e.g. drop: ["ALL"]), and set seccompProfile to runtime/default;
apply these changes in the Deployment spec near the existing container
definition so the server cannot run as root, escalate privileges, or write to
the image filesystem.
In `@gh-token-minter/mint-token.sh`:
- Around line 29-31: The code assumes at least one installation when assigning
inst_id from resp.json()[0]["id"] which will raise IndexError if none exist;
after the requests.get(...) call and resp.raise_for_status(), parse the body
into a variable (e.g., installs = resp.json()), verify it's a list and that
len(installs) > 0 (or that installs[0] exists), and if empty raise a clear error
or exit with a descriptive message like "No GitHub App installations found"
instead of letting an IndexError surface; update the inst_id assignment to read
from the validated installs[0]["id"].
In `@go-server-v2/config.go`:
- Around line 46-53: The parsed TTL_AFTER_FINISHED value can be negative
(strconv.ParseInt allows negatives) which Kubernetes rejects; in the config
initialization where ttl is set and TTL_AFTER_FINISHED is parsed (the ttl
variable and the TTL_AFTER_FINISHED parse block), add a non-negative validation
after parsing: if n < 0 return an error (e.g., fmt.Errorf("invalid
TTL_AFTER_FINISHED: must be non-negative")) so startup fails fast with a clear
message instead of accepting negative values.
In `@go-server-v2/k8s.go`:
- Around line 173-186: The code returns "pending" if no terminal Condition or
active pods are found, but it must also check job.Status.Succeeded and
job.Status.Failed to avoid the race where counters are updated before the
Failed/Succeeded Condition is set; update the logic in the function that
inspects the job (the loop over job.Status.Conditions and subsequent checks
using job.Status.Active) to first check if job.Status.Succeeded > 0 and return
&JobStatus{Status: "succeeded"}, nil, and then check if job.Status.Failed > 0
and return &JobStatus{Status: "failed", Message: ""}, nil (or include
cond.Message if available), before falling back to returning &JobStatus{Status:
"pending"}, nil so the JobStatus struct and the job variable are used to detect
terminal state via counters as well as conditions.
In `@go-server-v2/static/homepage.html`:
- Around line 147-148: The response body is consumed twice by calling res.json()
in the error check and again when extracting job_id; instead, call await
res.json() once, store the parsed JSON in a variable (e.g., const data), check
res.ok and throw using data.detail or res.statusText if not ok, then destructure
job_id from that same data; update the code paths around the res variable where
this occurs to use that single parsed object.
In `@images-v2/go-server.Dockerfile`:
- Line 4: The COPY instruction with multiple sources uses a destination that
must be a directory ending with a slash; update the COPY in the Dockerfile (the
line containing COPY ./go-server-v2/go.mod ./go-server-v2/go.sum .) to use a
directory destination with a trailing slash (e.g., change the final token from .
to ./ or to a specific directory like ./go-server-v2/) so the Docker build
treats it as a directory and succeeds.
---
Duplicate comments:
In `@go-server-v2/handlers.go`:
- Around line 238-258: After the scanner loop, check scanner.Err() and if
non-nil treat the stream as failed: write an SSE error event to w (e.g., "event:
error" or "event: complete" with status unknown and include
scanner.Err().Error() in the message), flush via flusher.Flush(), and return
immediately so you do not call a.k8s.GetJobStatus(ctx, jobID); this ensures
broken/read-failure streams are not mistaken for normal completion.
- Around line 126-129: The current handlers collapse all
a.k8s.GetJobStatus/GetJobPod/pod Get errors into 404/"waiting_for_pod"; change
them to inspect Kubernetes error classes (e.g., use
k8s.io/apimachinery/pkg/api/errors helpers or have K8sClient return sentinel
typed errors) so that only apierrors.IsNotFound (or your explicit "pod not
created yet" sentinel) is treated as not-found/pending and retried; for
RBAC/credential/apiserver errors (IsUnauthorized, IsForbidden, IsInvalid,
IsServerTimeout, etc.) call writeError with an appropriate HTTP status and the
real error message instead of masking it. Locate the logic around
a.k8s.GetJobStatus, a.k8s.GetJobPod, and the pod Get branches and replace the
unconditional writeError/return with branching that surfaces
non-NotFound/critical k8s errors while still allowing NotFound/pending to be
handled as before.
In `@go-server-v2/k8s.go`:
- Around line 78-147: PodSpec is currently allowing the default automounting of
the namespace service-account token; update the corev1.PodSpec for the Job pod
(the PodSpec that contains the "worker" container) to set
AutomountServiceAccountToken: pointer to false (AutomountServiceAccountToken:
ptr.Bool(false) or &[]bool{false}[0] per your helpers) so the service-account
token is not mounted into the container; modify the PodSpec in the same block
that defines RestartPolicy, Containers and Volumes to include this field and
ensure no code relies on in-cluster credentials.
---
Nitpick comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 59-74: The container "ghpat" in the Deployment spec currently has
no securityContext; update the "ghpat" container spec for image
"github-token-minter:latest" to add a securityContext that disables privilege
escalation (allowPrivilegeEscalation: false), drops capabilities
(capabilities.drop: ["ALL"]), runs as a non-root user (runAsNonRoot: true and/or
runAsUser: a non-root UID), enforces a read-only root filesystem
(readOnlyRootFilesystem: true) and optionally sets a seccompProfile (type:
RuntimeDefault); ensure any writable paths required by the pod are mounted as
writable volumes (e.g., emptyDir) so readOnlyRootFilesystem does not break
startup.
In `@images-v2/go-server.Dockerfile`:
- Around line 10-13: The runtime stage of the Dockerfile runs the container as
root; create a non-privileged user and switch to it in the runtime stage: add a
user/group (e.g., appuser), chown the copied binary (/usr/local/bin/app-server)
and any needed dirs to that user, and set USER to that non-root account before
ENTRYPOINT so the container runs as the non-root user instead of root (reference
the COPY --from=builder /tmp/app /usr/local/bin/app-server, EXPOSE 8080, and
ENTRYPOINT ["/usr/local/bin/app-server"] lines).
In `@images-v2/job.Dockerfile`:
- Around line 48-50: Replace the shell-form CMD with the exec JSON array form so
signals propagate to the Python process (change CMD python3.11 main.py to CMD
["python3.11","main.py"] in the Dockerfile), and review whether EXPOSE 8000 is
necessary for this batch job container — if the process does not listen on port
8000 remove the EXPOSE line; if it does, keep it but ensure it’s documented why
the port is exposed.
In `@server/gh-credential-helper-v2.sh`:
- Around line 6-11: The script currently reads the GitHub token into TOKEN and
echoes it as the password without validating it; update the logic around the
TOKEN variable to check that TOKEN is non-empty (after reading from
/etc/github-app-token/token) and if empty print a clear error message to stderr
(e.g., "Missing GitHub app token in /etc/github-app-token/token") and exit
non-zero instead of emitting an empty password; keep the existing echo lines
(protocol, host, username, password) but only print the password when TOKEN is
present so callers don’t receive an empty credential.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 86e2c326-baa9-4694-a7bb-2d62a31a3032
⛔ Files ignored due to path filters (1)
go-server-v2/go.sumis excluded by!**/*.sum
📒 Files selected for processing (15)
deploy-v2/gh-token-minter-deployment.yamldeploy-v2/go-server-deployment.yamldeploy-v2/kustomization.yamlgh-token-minter/mint-token.shgo-server-v2/config.gogo-server-v2/go.modgo-server-v2/handlers.gogo-server-v2/k8s.gogo-server-v2/main.gogo-server-v2/static/homepage.htmlimages-v2/gh-token-minter.Dockerfileimages-v2/go-server.Dockerfileimages-v2/job.Dockerfileserver/gh-credential-helper-v2.shserver/main.py
✅ Files skipped from review due to trivial changes (3)
- deploy-v2/kustomization.yaml
- images-v2/gh-token-minter.Dockerfile
- server/main.py
| - name: private-key | ||
| secret: | ||
| secretName: github-app-private-key |
There was a problem hiding this comment.
Secret name mismatch will cause pod startup failure.
The volume references secretName: github-app-private-key, but the Secret defined in this file is named gh-app-secret (line 35). The pod will fail to start with a "secret not found" error.
🔧 Proposed fix
Either rename the Secret:
apiVersion: v1
kind: Secret
metadata:
- name: gh-app-secret
+ name: github-app-private-keyOr update the volume reference:
volumes:
- name: private-key
secret:
- secretName: github-app-private-key
+ secretName: gh-app-secret📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: private-key | |
| secret: | |
| secretName: github-app-private-key | |
| - name: private-key | |
| secret: | |
| secretName: gh-app-secret |
🧰 Tools
🪛 Checkov (3.2.513)
[medium] 44-82: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 44-82: Minimize the admission of root containers
(CKV_K8S_23)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 80 - 82, The volume
is referencing secretName: github-app-private-key but the Secret resource in
this manifest is named gh-app-secret (metadata.name), causing pod startup
failure; fix by making these names consistent—either rename the Secret's
metadata.name to github-app-private-key or update the volume's secretName to
gh-app-secret (the volume entry with name: private-key and its secret.secretName
field).
| spec: | ||
| serviceAccountName: oape-go-server | ||
| containers: | ||
| - name: oape-go-server | ||
| image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest | ||
| ports: | ||
| - containerPort: 8080 | ||
| env: | ||
| - name: WORKER_IMAGE | ||
| value: ghcr.io/shiftweek/oape-ai-e2e:latest | ||
| volumeMounts: | ||
| - name: config | ||
| mountPath: /config | ||
| readOnly: true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "go-server*" -o -name "*go-server*" | head -20Repository: shiftweek/oape-ai-e2e
Length of output: 150
🏁 Script executed:
ls -la ./go-server-v2/Repository: shiftweek/oape-ai-e2e
Length of output: 595
🏁 Script executed:
cat -n ./go-server-v2/main.goRepository: shiftweek/oape-ai-e2e
Length of output: 1005
🏁 Script executed:
cat -n ./go-server-v2/handlers.goRepository: shiftweek/oape-ai-e2e
Length of output: 8285
🏁 Script executed:
cat -n ./go-server-v2/config.goRepository: shiftweek/oape-ai-e2e
Length of output: 5132
🏁 Script executed:
cat -n ./go-server-v2/k8s.goRepository: shiftweek/oape-ai-e2e
Length of output: 7494
Run the Go server with a restricted security context.
The Deployment currently uses the default pod/container security context, allowing the process to run as root, escalate privileges, and write to the image filesystem. This server only reads configuration from a ConfigMap and serves HTTP requests, so none of these capabilities are needed.
Suggested hardening
spec:
serviceAccountName: oape-go-server
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: oape-go-server
image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest
+ securityContext:
+ runAsNonRoot: true
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]
ports:
- containerPort: 8080📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| spec: | |
| serviceAccountName: oape-go-server | |
| containers: | |
| - name: oape-go-server | |
| image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest | |
| ports: | |
| - containerPort: 8080 | |
| env: | |
| - name: WORKER_IMAGE | |
| value: ghcr.io/shiftweek/oape-ai-e2e:latest | |
| volumeMounts: | |
| - name: config | |
| mountPath: /config | |
| readOnly: true | |
| spec: | |
| serviceAccountName: oape-go-server | |
| securityContext: | |
| seccompProfile: | |
| type: RuntimeDefault | |
| containers: | |
| - name: oape-go-server | |
| image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest | |
| securityContext: | |
| runAsNonRoot: true | |
| allowPrivilegeEscalation: false | |
| readOnlyRootFilesystem: true | |
| capabilities: | |
| drop: ["ALL"] | |
| ports: | |
| - containerPort: 8080 | |
| env: | |
| - name: WORKER_IMAGE | |
| value: ghcr.io/shiftweek/oape-ai-e2e:latest | |
| volumeMounts: | |
| - name: config | |
| mountPath: /config | |
| readOnly: true |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy-v2/go-server-deployment.yaml` around lines 68 - 81, Add a restrictive
securityContext for the oape-go-server container and a pod-level security
context: set podSecurityContext with runAsNonRoot:true and an fsGroup (e.g.
1000) to ensure ConfigMap files are readable, and in the container
securityContext for the container named "oape-go-server" set runAsUser to a
non-root UID (e.g. 1000), readOnlyRootFilesystem:true,
allowPrivilegeEscalation:false, drop all capabilities (e.g. drop: ["ALL"]), and
set seccompProfile to runtime/default; apply these changes in the Deployment
spec near the existing container definition so the server cannot run as root,
escalate privileges, or write to the image filesystem.
gh-token-minter/mint-token.sh
Outdated
| resp = requests.get("https://api.github.com/app/installations", headers=headers) | ||
| resp.raise_for_status() | ||
| inst_id = resp.json()[0]["id"] |
There was a problem hiding this comment.
Handle case where no installations exist.
If the GitHub App has no installations, resp.json()[0] will raise an IndexError. Add validation to provide a clearer error message.
🛡️ Proposed fix
resp = requests.get("https://api.github.com/app/installations", headers=headers)
resp.raise_for_status()
-inst_id = resp.json()[0]["id"]
+installations = resp.json()
+if not installations:
+ raise SystemExit("No installations found for this GitHub App")
+inst_id = installations[0]["id"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resp = requests.get("https://api.github.com/app/installations", headers=headers) | |
| resp.raise_for_status() | |
| inst_id = resp.json()[0]["id"] | |
| resp = requests.get("https://api.github.com/app/installations", headers=headers) | |
| resp.raise_for_status() | |
| installations = resp.json() | |
| if not installations: | |
| raise SystemExit("No installations found for this GitHub App") | |
| inst_id = installations[0]["id"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gh-token-minter/mint-token.sh` around lines 29 - 31, The code assumes at
least one installation when assigning inst_id from resp.json()[0]["id"] which
will raise IndexError if none exist; after the requests.get(...) call and
resp.raise_for_status(), parse the body into a variable (e.g., installs =
resp.json()), verify it's a list and that len(installs) > 0 (or that installs[0]
exists), and if empty raise a clear error or exit with a descriptive message
like "No GitHub App installations found" instead of letting an IndexError
surface; update the inst_id assignment to read from the validated
installs[0]["id"].
| ttl := int32(5400) // 1h30m | ||
| if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" { | ||
| n, err := strconv.ParseInt(v, 10, 32) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err) | ||
| } | ||
| ttl = int32(n) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check the file exists and read the relevant section with context
cd go-server-v2 && wc -l config.go && cat -n config.go | head -70Repository: shiftweek/oape-ai-e2e
Length of output: 2273
🏁 Script executed:
# Search for where ttl is used after being set
cd go-server-v2 && rg "ttl" -i -A 5 -B 5 config.goRepository: shiftweek/oape-ai-e2e
Length of output: 1401
🏁 Script executed:
# Search the entire codebase for where this TTL value is passed to Kubernetes
cd go-server-v2 && rg "ttlSecondsAfterFinished|ttl" --type go -A 3 -B 3 | head -100Repository: shiftweek/oape-ai-e2e
Length of output: 1553
🌐 Web query:
Kubernetes Job spec ttlSecondsAfterFinished negative value requirement
💡 Result:
In a Kubernetes Job (batch/v1), .spec.ttlSecondsAfterFinished must be a non-negative integer (≥ 0). If you set a negative value, the API server rejects the object with a validation error like: spec.ttlSecondsAfterFinished: must be greater than or equal to 0.[1]
Semantics:
- Omitted /
null: TTL cleanup is not configured. 0: the finished Job becomes eligible for deletion immediately after it completes.[2]
Sources: [1] [2]
Validate TTL_AFTER_FINISHED is non-negative at startup.
strconv.ParseInt accepts negative values, but Kubernetes rejects negative ttlSecondsAfterFinished. Letting that through means the server boots successfully and only fails when users submit jobs.
Possible fix
ttl := int32(5400) // 1h30m
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" {
n, err := strconv.ParseInt(v, 10, 32)
if err != nil {
return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err)
}
+ if n < 0 {
+ return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: must be >= 0")
+ }
ttl = int32(n)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ttl := int32(5400) // 1h30m | |
| if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" { | |
| n, err := strconv.ParseInt(v, 10, 32) | |
| if err != nil { | |
| return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err) | |
| } | |
| ttl = int32(n) | |
| } | |
| ttl := int32(5400) // 1h30m | |
| if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" { | |
| n, err := strconv.ParseInt(v, 10, 32) | |
| if err != nil { | |
| return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err) | |
| } | |
| if n < 0 { | |
| return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: must be >= 0") | |
| } | |
| ttl = int32(n) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-server-v2/config.go` around lines 46 - 53, The parsed TTL_AFTER_FINISHED
value can be negative (strconv.ParseInt allows negatives) which Kubernetes
rejects; in the config initialization where ttl is set and TTL_AFTER_FINISHED is
parsed (the ttl variable and the TTL_AFTER_FINISHED parse block), add a
non-negative validation after parsing: if n < 0 return an error (e.g.,
fmt.Errorf("invalid TTL_AFTER_FINISHED: must be non-negative")) so startup fails
fast with a clear message instead of accepting negative values.
| for _, cond := range job.Status.Conditions { | ||
| if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue { | ||
| return &JobStatus{Status: "succeeded"}, nil | ||
| } | ||
| if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue { | ||
| return &JobStatus{Status: "failed", Message: cond.Message}, nil | ||
| } | ||
| } | ||
|
|
||
| if job.Status.Active > 0 { | ||
| return &JobStatus{Status: "running"}, nil | ||
| } | ||
|
|
||
| return &JobStatus{Status: "pending"}, nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "k8s.go" | head -20Repository: shiftweek/oape-ai-e2e
Length of output: 86
🏁 Script executed:
cat -n ./go-server-v2/k8s.go | sed -n '160,200p'Repository: shiftweek/oape-ai-e2e
Length of output: 1757
🏁 Script executed:
grep -n "CreateWorkflowJob" ./go-server-v2/k8s.go | head -5Repository: shiftweek/oape-ai-e2e
Length of output: 240
🏁 Script executed:
cat -n ./go-server-v2/k8s.go | sed -n '55,120p'Repository: shiftweek/oape-ai-e2e
Length of output: 2408
🏁 Script executed:
cat -n ./go-server-v2/k8s.go | head -30Repository: shiftweek/oape-ai-e2e
Length of output: 1108
🌐 Web query:
Kubernetes Job controller BackoffLimit 0 conditions counters timing
💡 Result:
What backoffLimit: 0 does (Job controller behavior)
spec.backoffLimitis the maximum number of failures the Job controller will tolerate before it marks the Job as failed. (kubernetes.io)- With
backoffLimit: 0, the Job gets zero retries: the first counted failure immediately causes the Job to be considered failed. (Practically: you’ll get at most one “attempt” that is allowed to fail.) (kubernetes.io)
Counters you’ll see (.status.*)
.status.failedincrements as the controller observes failed Pods for the Job; withbackoffLimit: 0, it typically reaches 1 on the first failed Pod and the Job becomes failed. (v1-33.docs.kubernetes.io).status.succeededstays 0 (unless you have another Pod that succeeded before the failure condition was triggered, e.g., parallelism > 1). (kubernetes.io)
Conditions you’ll see (.status.conditions)
- When the backoff limit is exceeded, the Job is marked failed and you commonly see a terminal condition of type
Failed=Truewith reasonBackoffLimitExceeded. (github.com) - On newer Kubernetes, the controller may first add an intermediate
FailureTargetcondition when the failure criteria are met, and only later add the terminalFailedcondition after cleanup/termination of Pods finishes. (kubernetes.io)
Timing / “why didn’t it fail instantly?”
- Even after the controller decides the Job has failed (backoff exceeded), there can be delay before everything looks “done” because Pods may still be terminating (for example due to
terminationGracePeriodSeconds). The docs explicitly note timing gaps between setting intermediate conditions (likeFailureTarget) and the final terminal condition (Failed/Complete). (kubernetes.io) - Exponential backoff between retries is only relevant when retries are allowed; with
backoffLimit: 0, there’s no second retry to wait for. (kubernetes.io)
Citations:
- 1: https://kubernetes.io/docs/concepts/workloads/controllers/job/?utm_source=openai
- 2: https://kubernetes.io/docs/concepts/workloads/controllers/job/?utm_source=openai
- 3: https://v1-33.docs.kubernetes.io/docs/concepts/workloads/controllers/job/?utm_source=openai
- 4: https://kubernetes.io/docs/concepts/workloads/controllers/job/?utm_source=openai
- 5: Job's backoffLimit needs to take number of failed pods into account kubernetes/kubernetes#70251
- 6: https://kubernetes.io/docs/concepts/workloads/controllers/job/?utm_source=openai
- 7: https://kubernetes.io/docs/concepts/workloads/controllers/job/?utm_source=openai
- 8: https://kubernetes.io/docs/concepts/workloads/controllers/job/?utm_source=openai
Check Job counters before falling back to pending.
With BackoffLimit: 0 in CreateWorkflowJob, the Kubernetes controller marks a Job as failed immediately upon the first Pod failure by incrementing .status.failed. However, there is a timing gap before the terminal Failed condition is written. During this window, GetJobStatus can report a finished job as pending because it only checks conditions and the Active counter.
Include checks for .status.Succeeded > 0 and .status.Failed > 0 before the fallback to pending status to eliminate this race condition.
Suggested fix
for _, cond := range job.Status.Conditions {
if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "succeeded"}, nil
}
if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue {
return &JobStatus{Status: "failed", Message: cond.Message}, nil
}
}
+
+ if job.Status.Succeeded > 0 {
+ return &JobStatus{Status: "succeeded"}, nil
+ }
+ if job.Status.Failed > 0 {
+ return &JobStatus{Status: "failed"}, nil
+ }
if job.Status.Active > 0 {
return &JobStatus{Status: "running"}, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for _, cond := range job.Status.Conditions { | |
| if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue { | |
| return &JobStatus{Status: "succeeded"}, nil | |
| } | |
| if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue { | |
| return &JobStatus{Status: "failed", Message: cond.Message}, nil | |
| } | |
| } | |
| if job.Status.Active > 0 { | |
| return &JobStatus{Status: "running"}, nil | |
| } | |
| return &JobStatus{Status: "pending"}, nil | |
| for _, cond := range job.Status.Conditions { | |
| if cond.Type == batchv1.JobComplete && cond.Status == corev1.ConditionTrue { | |
| return &JobStatus{Status: "succeeded"}, nil | |
| } | |
| if cond.Type == batchv1.JobFailed && cond.Status == corev1.ConditionTrue { | |
| return &JobStatus{Status: "failed", Message: cond.Message}, nil | |
| } | |
| } | |
| if job.Status.Succeeded > 0 { | |
| return &JobStatus{Status: "succeeded"}, nil | |
| } | |
| if job.Status.Failed > 0 { | |
| return &JobStatus{Status: "failed"}, nil | |
| } | |
| if job.Status.Active > 0 { | |
| return &JobStatus{Status: "running"}, nil | |
| } | |
| return &JobStatus{Status: "pending"}, nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go-server-v2/k8s.go` around lines 173 - 186, The code returns "pending" if no
terminal Condition or active pods are found, but it must also check
job.Status.Succeeded and job.Status.Failed to avoid the race where counters are
updated before the Failed/Succeeded Condition is set; update the logic in the
function that inspects the job (the loop over job.Status.Conditions and
subsequent checks using job.Status.Active) to first check if
job.Status.Succeeded > 0 and return &JobStatus{Status: "succeeded"}, nil, and
then check if job.Status.Failed > 0 and return &JobStatus{Status: "failed",
Message: ""}, nil (or include cond.Message if available), before falling back to
returning &JobStatus{Status: "pending"}, nil so the JobStatus struct and the job
variable are used to detect terminal state via counters as well as conditions.
images-v2/go-server.Dockerfile
Outdated
| FROM registry.access.redhat.com/ubi9/go-toolset AS builder | ||
| WORKDIR /go/src/app | ||
|
|
||
| COPY ./go-server-v2/go.mod ./go-server-v2/go.sum . |
There was a problem hiding this comment.
Fix COPY destination syntax.
When COPY has more than two arguments, the destination must be a directory ending with /. The current syntax will fail or produce unexpected results.
🔧 Proposed fix
-COPY ./go-server-v2/go.mod ./go-server-v2/go.sum .
+COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ./📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| COPY ./go-server-v2/go.mod ./go-server-v2/go.sum . | |
| COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ./ |
🧰 Tools
🪛 Hadolint (2.14.0)
[error] 4-4: COPY with more than 2 arguments requires the last argument to end with /
(DL3021)
🪛 Trivy (0.69.3)
[error] 4-4: COPY with more than two arguments not ending with slash
Slash is expected at the end of COPY command argument '.'
Rule: DS-0011
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@images-v2/go-server.Dockerfile` at line 4, The COPY instruction with multiple
sources uses a destination that must be a directory ending with a slash; update
the COPY in the Dockerfile (the line containing COPY ./go-server-v2/go.mod
./go-server-v2/go.sum .) to use a directory destination with a trailing slash
(e.g., change the final token from . to ./ or to a specific directory like
./go-server-v2/) so the Docker build treats it as a directory and succeeds.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (6)
deploy-v2/gh-token-minter-deployment.yaml (1)
79-82:⚠️ Potential issue | 🔴 CriticalFix secret reference mismatch to avoid pod startup failure.
Line 82 references
github-app-private-key, but the Secret defined at Line 35 isgh-app-secret.Suggested fix
volumes: - name: private-key secret: - secretName: github-app-private-key + secretName: gh-app-secret🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 79 - 82, The pod volume named "private-key" references secretName "github-app-private-key" but the actual Secret created is "gh-app-secret"; update the secretName value referenced by the volume (volume: name "private-key") to "gh-app-secret" so the volume mounts the correct Secret (or alternatively rename the Secret resource to "github-app-private-key" to match the volume), ensuring the names for the Secret and the secretName in the deployment match.go-server-v2/config.go (1)
47-53:⚠️ Potential issue | 🟡 MinorReject negative
TTL_AFTER_FINISHEDvalues during config load.Line 48-Line 53 accepts negative TTL values, but Kubernetes Job TTL must be non-negative. Validate here so startup fails fast with a clear error.
Suggested fix
if v := os.Getenv("TTL_AFTER_FINISHED"); v != "" { n, err := strconv.ParseInt(v, 10, 32) if err != nil { return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: %w", err) } + if n < 0 { + return nil, fmt.Errorf("invalid TTL_AFTER_FINISHED: must be >= 0") + } ttl = int32(n) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-server-v2/config.go` around lines 47 - 53, When parsing TTL_AFTER_FINISHED in the config load, reject negative values: after strconv.ParseInt(v, 10, 32) and before assigning ttl = int32(n), check if n < 0 and return a descriptive error (e.g. "invalid TTL_AFTER_FINISHED: must be non-negative") so startup fails fast; update the block that reads TTL_AFTER_FINISHED (the strconv.ParseInt call and the ttl assignment) to enforce this non-negative validation.images-v2/go-server.Dockerfile (1)
4-4:⚠️ Potential issue | 🔴 CriticalFix invalid multi-source
COPYdestination.Line 4 uses multiple sources with destination
.; this must be a directory token ending with/.Suggested fix
-COPY ./go-server-v2/go.mod ./go-server-v2/go.sum . +COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ./#!/bin/bash # Verify multi-source COPY destinations in this Dockerfile end with '/' python - <<'PY' from pathlib import Path p = Path("images-v2/go-server.Dockerfile") for i, line in enumerate(p.read_text().splitlines(), 1): s = line.strip() if not s.upper().startswith("COPY "): continue toks = s.split() if len(toks) > 3: dest = toks[-1] if not dest.endswith("/"): print(f"Line {i}: invalid multi-source COPY destination -> {dest}") PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@images-v2/go-server.Dockerfile` at line 4, The Dockerfile's multi-source COPY instruction "COPY ./go-server-v2/go.mod ./go-server-v2/go.sum ." uses a destination token '.' which is invalid for multiple sources; update the COPY line (the COPY instruction that lists ./go-server-v2/go.mod and ./go-server-v2/go.sum) so the destination is a directory token that ends with '/' (for example replace '.' with './' or a named directory like 'app/'), preserving the two source paths and ensuring the destination ends with a trailing slash.deploy-v2/go-server-deployment.yaml (1)
68-81:⚠️ Potential issue | 🟠 MajorAdd restrictive pod/container security context.
The workload runs with default security settings. Please explicitly enforce non-root execution and disable privilege escalation.
Suggested fix
spec: serviceAccountName: oape-go-server + securityContext: + seccompProfile: + type: RuntimeDefault containers: - name: oape-go-server image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest + securityContext: + runAsNonRoot: true + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: ["ALL"] ports: - containerPort: 8080🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy-v2/go-server-deployment.yaml` around lines 68 - 81, The pod currently uses default security settings; update the manifest around spec.serviceAccountName and the container with name "oape-go-server" to enforce non-root and disable privilege escalation by adding a podSecurityContext and container.securityContext: set runAsNonRoot: true and a non-root runAsUser (e.g. 1000) at pod or container level, set allowPrivilegeEscalation: false in the container securityContext, drop all capabilities (capabilities.drop: ["ALL"]), enable readOnlyRootFilesystem: true for the container, and add a seccompProfile (type: RuntimeDefault) to harden the workload; apply these fields next to the existing containers/volumeMounts entries for "oape-go-server".go-server-v2/go.mod (1)
35-35:⚠️ Potential issue | 🟠 MajorUpgrade vulnerable
golang.org/x/oauth2version.Line 35 uses
v0.21.0, which is flagged for GO-2025-3488/GHSA-6v2p-p543-phr9. Please bump to a patched version (>=v0.27.0).Suggested fix
- golang.org/x/oauth2 v0.21.0 // indirect + golang.org/x/oauth2 v0.27.0 // indirectIs golang.org/x/oauth2 v0.21.0 affected by GO-2025-3488 / GHSA-6v2p-p543-phr9, and what is the minimum fixed version?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go-server-v2/go.mod` at line 35, The dependency golang.org/x/oauth2 is pinned at v0.21.0 which is vulnerable; update the module requirement to a patched release (>= v0.27.0) by running go get golang.org/x/oauth2@v0.27.0 (or a newer v0.27.x+), then run go mod tidy to update go.mod/go.sum and rebuild/tests to verify no breakage and that the upgraded golang.org/x/oauth2 version is reflected in the module list.server/main.py (1)
26-37:⚠️ Potential issue | 🟠 MajorEmit lifecycle events as JSON lines, not plain strings.
Line 26 and Line 31-Line 37 currently output plain text, which breaks this module’s stated JSONL contract and makes stream parsing inconsistent.
Suggested fix
- print(f"Starting workflow: ep_url={ep_url} repo={repo}", flush=True) + print(json.dumps({ + "event": "workflow_start", + "ep_url": ep_url, + "repo": repo, + }, default=str), flush=True) @@ - print(f"WORKFLOW_SUCCESS cost=${result.cost_usd:.4f}", flush=True) + print(json.dumps({ + "event": "workflow_success", + "cost_usd": result.cost_usd, + }, default=str), flush=True) for pr in result.prs: - print(f"PR_CREATED: {pr.pr_url}", flush=True) + print(json.dumps({ + "event": "pr_created", + "pr_url": pr.pr_url, + }, default=str), flush=True) @@ - print(f"WORKFLOW_FAILED: {result.error}", file=sys.stderr, flush=True) + print(json.dumps({ + "event": "workflow_failed", + "error": str(result.error), + }, default=str), file=sys.stderr, flush=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/main.py` around lines 26 - 37, The scripts currently print lifecycle messages as plain text; change them to emit JSONL via the existing print_json helper so downstream parsers get structured events: replace the initial print("Starting workflow...") with print_json(data={"event":"WORKFLOW_STARTED","ep_url":ep_url,"repo":repo}), replace the WORKFLOW_SUCCESS/WORKFLOW_FAILED prints with print_json(data={"event":"WORKFLOW_SUCCESS","cost_usd":result.cost_usd}) and print_json(data={"event":"WORKFLOW_FAILED","error":result.error}) respectively, and when iterating result.prs call print_json(data={"event":"PR_CREATED","pr_url":pr.pr_url}) for each PR; keep the same exit codes and use the run_workflow, result, pr, and print_json symbols to locate the changes.
🧹 Nitpick comments (2)
deploy-v2/go-server-deployment.yaml (1)
72-72: Prefer immutable image pinning over:latest.Using
:latestmakes deployments non-reproducible and can cause drift between rollouts. Pin to a version tag or digest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy-v2/go-server-deployment.yaml` at line 72, The container image currently uses an immutable-unsafe tag "ghcr.io/shiftweek/oape-ai-e2e/go-server:latest"; change the image reference in the go-server container spec (the line containing image: ghcr.io/shiftweek/oape-ai-e2e/go-server:latest) to a pinned version (e.g., a release tag like :v1.2.3) or an immutable digest (sha256:...), and optionally set imagePullPolicy to IfNotPresent to avoid unexpected updates during rollout.deploy-v2/gh-token-minter-deployment.yaml (1)
15-17: Reduce RBAC scope by removing broadsecrets/createif possible.Line 15-Line 17 allows creating any Secret in the namespace. Prefer pre-creating
github-app-tokenand limiting this role toget/update/patchonly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 15 - 17, The current RBAC rule grants the broad "create" verb on resources: ["secrets"], allowing creation of any Secret; change the rule to restrict verbs to ["get","update","patch"] and add resourceNames: ["github-app-token"] so the role only operates on the pre-created secret named github-app-token; update the block that contains resources: ["secrets"] and verbs: ["create"] accordingly (replace "create" with the new verbs and add resourceNames) and ensure the github-app-token secret is pre-created in the namespace.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 57-78: The Deployment runs with default privileges; add explicit
pod- and container-level securityContext entries to enforce non-root and no
privilege escalation: under spec (pod level) add securityContext: runAsNonRoot:
true and a non-zero runAsUser (e.g., 1000) and optionally fsGroup if needed, and
for the container named "ghpat" add container-level securityContext with
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities: {
drop: ["ALL"] } (and optionally seccompProfile: { type: RuntimeDefault }); keep
the existing volumeMount (private-key) readOnly and ensure any needed file perms
match the chosen runAsUser.
In `@images-v2/go-server.Dockerfile`:
- Around line 10-13: The Dockerfile's runtime stage runs the server as root
because no USER is set; modify the runtime stage to create a non-root user/group
(e.g., appuser), chown the binary at /usr/local/bin/app-server to that user, and
add a USER directive before ENTRYPOINT so the container runs as the non-root
user; ensure any required directories or ports are accessible to that user and
reference the existing ENTRYPOINT ["/usr/local/bin/app-server"] when making
these changes.
---
Duplicate comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 79-82: The pod volume named "private-key" references secretName
"github-app-private-key" but the actual Secret created is "gh-app-secret";
update the secretName value referenced by the volume (volume: name
"private-key") to "gh-app-secret" so the volume mounts the correct Secret (or
alternatively rename the Secret resource to "github-app-private-key" to match
the volume), ensuring the names for the Secret and the secretName in the
deployment match.
In `@deploy-v2/go-server-deployment.yaml`:
- Around line 68-81: The pod currently uses default security settings; update
the manifest around spec.serviceAccountName and the container with name
"oape-go-server" to enforce non-root and disable privilege escalation by adding
a podSecurityContext and container.securityContext: set runAsNonRoot: true and a
non-root runAsUser (e.g. 1000) at pod or container level, set
allowPrivilegeEscalation: false in the container securityContext, drop all
capabilities (capabilities.drop: ["ALL"]), enable readOnlyRootFilesystem: true
for the container, and add a seccompProfile (type: RuntimeDefault) to harden the
workload; apply these fields next to the existing containers/volumeMounts
entries for "oape-go-server".
In `@go-server-v2/config.go`:
- Around line 47-53: When parsing TTL_AFTER_FINISHED in the config load, reject
negative values: after strconv.ParseInt(v, 10, 32) and before assigning ttl =
int32(n), check if n < 0 and return a descriptive error (e.g. "invalid
TTL_AFTER_FINISHED: must be non-negative") so startup fails fast; update the
block that reads TTL_AFTER_FINISHED (the strconv.ParseInt call and the ttl
assignment) to enforce this non-negative validation.
In `@go-server-v2/go.mod`:
- Line 35: The dependency golang.org/x/oauth2 is pinned at v0.21.0 which is
vulnerable; update the module requirement to a patched release (>= v0.27.0) by
running go get golang.org/x/oauth2@v0.27.0 (or a newer v0.27.x+), then run go
mod tidy to update go.mod/go.sum and rebuild/tests to verify no breakage and
that the upgraded golang.org/x/oauth2 version is reflected in the module list.
In `@images-v2/go-server.Dockerfile`:
- Line 4: The Dockerfile's multi-source COPY instruction "COPY
./go-server-v2/go.mod ./go-server-v2/go.sum ." uses a destination token '.'
which is invalid for multiple sources; update the COPY line (the COPY
instruction that lists ./go-server-v2/go.mod and ./go-server-v2/go.sum) so the
destination is a directory token that ends with '/' (for example replace '.'
with './' or a named directory like 'app/'), preserving the two source paths and
ensuring the destination ends with a trailing slash.
In `@server/main.py`:
- Around line 26-37: The scripts currently print lifecycle messages as plain
text; change them to emit JSONL via the existing print_json helper so downstream
parsers get structured events: replace the initial print("Starting workflow...")
with print_json(data={"event":"WORKFLOW_STARTED","ep_url":ep_url,"repo":repo}),
replace the WORKFLOW_SUCCESS/WORKFLOW_FAILED prints with
print_json(data={"event":"WORKFLOW_SUCCESS","cost_usd":result.cost_usd}) and
print_json(data={"event":"WORKFLOW_FAILED","error":result.error}) respectively,
and when iterating result.prs call
print_json(data={"event":"PR_CREATED","pr_url":pr.pr_url}) for each PR; keep the
same exit codes and use the run_workflow, result, pr, and print_json symbols to
locate the changes.
---
Nitpick comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 15-17: The current RBAC rule grants the broad "create" verb on
resources: ["secrets"], allowing creation of any Secret; change the rule to
restrict verbs to ["get","update","patch"] and add resourceNames:
["github-app-token"] so the role only operates on the pre-created secret named
github-app-token; update the block that contains resources: ["secrets"] and
verbs: ["create"] accordingly (replace "create" with the new verbs and add
resourceNames) and ensure the github-app-token secret is pre-created in the
namespace.
In `@deploy-v2/go-server-deployment.yaml`:
- Line 72: The container image currently uses an immutable-unsafe tag
"ghcr.io/shiftweek/oape-ai-e2e/go-server:latest"; change the image reference in
the go-server container spec (the line containing image:
ghcr.io/shiftweek/oape-ai-e2e/go-server:latest) to a pinned version (e.g., a
release tag like :v1.2.3) or an immutable digest (sha256:...), and optionally
set imagePullPolicy to IfNotPresent to avoid unexpected updates during rollout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: adafdb36-9228-4202-991d-6013ac239c7f
⛔ Files ignored due to path filters (2)
deploy-v2/config/team-repos.csvis excluded by!**/*.csvgo-server-v2/go.sumis excluded by!**/*.sum
📒 Files selected for processing (17)
deploy-v2/config/config.jsondeploy-v2/gh-token-minter-deployment.yamldeploy-v2/go-server-deployment.yamldeploy-v2/kustomization.yamlgh-token-minter/mint-token.shgo-server-v2/config.gogo-server-v2/go.modgo-server-v2/handlers.gogo-server-v2/k8s.gogo-server-v2/main.gogo-server-v2/static/homepage.htmlimages-v2/gh-token-minter.Dockerfileimages-v2/go-server.Dockerfileimages-v2/job.Dockerfileserver/gh-credential-helper-v2.shserver/main.pyserver/requirements.txt
✅ Files skipped from review due to trivial changes (4)
- server/requirements.txt
- deploy-v2/config/config.json
- go-server-v2/static/homepage.html
- gh-token-minter/mint-token.sh
🚧 Files skipped from review as they are similar to previous changes (7)
- server/gh-credential-helper-v2.sh
- images-v2/gh-token-minter.Dockerfile
- images-v2/job.Dockerfile
- go-server-v2/main.go
- deploy-v2/kustomization.yaml
- go-server-v2/handlers.go
- go-server-v2/k8s.go
| spec: | ||
| serviceAccountName: github-token-minter | ||
| containers: | ||
| - name: ghpat | ||
| image: github-token-minter:latest | ||
| env: | ||
| - name: APP_ID | ||
| value: "3065249" | ||
| - name: PRIVATE_KEY_PATH | ||
| value: /etc/github-app/private-key.pem | ||
| - name: SECRET_NAME | ||
| value: github-app-token | ||
| - name: SECRET_NAMESPACE | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.namespace | ||
| - name: REFRESH_INTERVAL | ||
| value: "1000" | ||
| volumeMounts: | ||
| - name: private-key | ||
| mountPath: /etc/github-app | ||
| readOnly: true |
There was a problem hiding this comment.
Harden pod/container security context.
The Deployment currently runs with default security settings. Add explicit non-root, no-privilege-escalation, and read-only root FS controls.
Suggested fix
spec:
serviceAccountName: github-token-minter
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: ghpat
image: github-token-minter:latest
+ securityContext:
+ runAsNonRoot: true
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ capabilities:
+ drop: ["ALL"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| spec: | |
| serviceAccountName: github-token-minter | |
| containers: | |
| - name: ghpat | |
| image: github-token-minter:latest | |
| env: | |
| - name: APP_ID | |
| value: "3065249" | |
| - name: PRIVATE_KEY_PATH | |
| value: /etc/github-app/private-key.pem | |
| - name: SECRET_NAME | |
| value: github-app-token | |
| - name: SECRET_NAMESPACE | |
| valueFrom: | |
| fieldRef: | |
| fieldPath: metadata.namespace | |
| - name: REFRESH_INTERVAL | |
| value: "1000" | |
| volumeMounts: | |
| - name: private-key | |
| mountPath: /etc/github-app | |
| readOnly: true | |
| spec: | |
| serviceAccountName: github-token-minter | |
| securityContext: | |
| seccompProfile: | |
| type: RuntimeDefault | |
| containers: | |
| - name: ghpat | |
| image: github-token-minter:latest | |
| securityContext: | |
| runAsNonRoot: true | |
| allowPrivilegeEscalation: false | |
| readOnlyRootFilesystem: true | |
| capabilities: | |
| drop: ["ALL"] | |
| env: | |
| - name: APP_ID | |
| value: "3065249" | |
| - name: PRIVATE_KEY_PATH | |
| value: /etc/github-app/private-key.pem | |
| - name: SECRET_NAME | |
| value: github-app-token | |
| - name: SECRET_NAMESPACE | |
| valueFrom: | |
| fieldRef: | |
| fieldPath: metadata.namespace | |
| - name: REFRESH_INTERVAL | |
| value: "1000" | |
| volumeMounts: | |
| - name: private-key | |
| mountPath: /etc/github-app | |
| readOnly: true |
🧰 Tools
🪛 Trivy (0.69.3)
[error] 53-78: Default security context configured
deployment github-token-minter in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 57 - 78, The
Deployment runs with default privileges; add explicit pod- and container-level
securityContext entries to enforce non-root and no privilege escalation: under
spec (pod level) add securityContext: runAsNonRoot: true and a non-zero
runAsUser (e.g., 1000) and optionally fsGroup if needed, and for the container
named "ghpat" add container-level securityContext with allowPrivilegeEscalation:
false, readOnlyRootFilesystem: true, capabilities: { drop: ["ALL"] } (and
optionally seccompProfile: { type: RuntimeDefault }); keep the existing
volumeMount (private-key) readOnly and ensure any needed file perms match the
chosen runAsUser.
| FROM registry.access.redhat.com/ubi9-minimal:latest | ||
| COPY --from=builder /tmp/app /usr/local/bin/app-server | ||
| EXPOSE 8080 | ||
| ENTRYPOINT ["/usr/local/bin/app-server"] |
There was a problem hiding this comment.
Run the runtime container as non-root.
The runtime stage does not set a non-root user, so the server runs as root by default.
Suggested fix
FROM registry.access.redhat.com/ubi9-minimal:latest
COPY --from=builder /tmp/app /usr/local/bin/app-server
+USER 1001
EXPOSE 8080
ENTRYPOINT ["/usr/local/bin/app-server"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FROM registry.access.redhat.com/ubi9-minimal:latest | |
| COPY --from=builder /tmp/app /usr/local/bin/app-server | |
| EXPOSE 8080 | |
| ENTRYPOINT ["/usr/local/bin/app-server"] | |
| FROM registry.access.redhat.com/ubi9-minimal:latest | |
| COPY --from=builder /tmp/app /usr/local/bin/app-server | |
| USER 1001 | |
| EXPOSE 8080 | |
| ENTRYPOINT ["/usr/local/bin/app-server"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@images-v2/go-server.Dockerfile` around lines 10 - 13, The Dockerfile's
runtime stage runs the server as root because no USER is set; modify the runtime
stage to create a non-root user/group (e.g., appuser), chown the binary at
/usr/local/bin/app-server to that user, and add a USER directive before
ENTRYPOINT so the container runs as the non-root user; ensure any required
directories or ports are accessible to that user and reference the existing
ENTRYPOINT ["/usr/local/bin/app-server"] when making these changes.
|
@coderabbitai Please draw a diagram for the new architecture, use Mermaid. |
|
Tip For best results, initiate chat on the files or code changes.
graph TB
subgraph Browser["User Browser"]
UI["Web UI\n(homepage.html)"]
end
subgraph K8s["Kubernetes Cluster (namespace: shift-ai-v2)"]
subgraph GoServer["Deployment: oape-go-server"]
GSvc["Go HTTP Server\n:8080\n\nGET /\nGET /repos\nPOST /submit\nGET /status/jobid\nGET /stream/jobid (SSE)"]
end
subgraph TokenMinter["Deployment: github-token-minter"]
Minter["Token Minter\nmint-token.sh\n(loop every ~3500s)"]
end
subgraph WorkflowJob["batch Job: oape-workflow"]
PyScript["Python Worker\nmain.py\n\nrun_workflow(ep_url, repo)"]
end
subgraph Secrets["Secrets and ConfigMaps"]
GHTokenSecret["Secret\ngithub-app-token\n(token refreshed by minter)"]
GHAppSecret["Secret\ngh-app-secret\n(private-key.pem)"]
GCloudSecret["Secret\ngcloud-adc-secret\n(ADC JSON)"]
WorkerCM["ConfigMap\noape-worker-config\n(Vertex AI / Anthropic env)"]
OapeCM["ConfigMap\noape-configs\n(config.json, team-repos.csv)"]
end
subgraph RBAC["RBAC"]
SA_Go["ServiceAccount\noape-go-server\n(create/list/watch Jobs, Pods, logs)"]
SA_Minter["ServiceAccount\ngithub-token-minter\n(get/update/patch/create Secret)"]
end
end
subgraph ExternalServices["External Services"]
GitHubAPI["GitHub API\n(App Installations\n+ Access Tokens)"]
GitHubRepo["GitHub\n(git clone / push PR)"]
VertexAI["Vertex AI / Anthropic\n(Claude)"]
end
UI -->|"POST /submit (ep_url, repo)"| GSvc
UI -->|"GET /stream/jobid SSE"| GSvc
GSvc -->|"HTML"| UI
GSvc -->|"Create Job\n(via K8s API)"| WorkflowJob
GSvc -->|"Poll status / stream pod logs"| WorkflowJob
GSvc -->|"Reads team-repos.csv"| OapeCM
GSvc --- SA_Go
Minter -->|"Read PEM key"| GHAppSecret
Minter -->|"JWT to Installation Token"| GitHubAPI
Minter -->|"kubectl apply secret"| GHTokenSecret
Minter --- SA_Minter
PyScript -->|"EnvFrom"| WorkerCM
PyScript -->|"Mount /etc/github-app-token"| GHTokenSecret
PyScript -->|"Mount /config"| OapeCM
PyScript -->|"Mount ADC JSON"| GCloudSecret
PyScript -->|"git credential helper - clone/push"| GitHubRepo
PyScript -->|"LLM calls"| VertexAI
classDef k8sbox fill:steelblue,stroke:darkblue,color:black
classDef extbox fill:mediumseagreen,stroke:seagreen,color:black
classDef secretbox fill:sandybrown,stroke:darkgoldenrod,color:black
classDef rbacbox fill:mediumpurple,stroke:rebeccapurple,color:black
class GoServer,TokenMinter,WorkflowJob k8sbox
class ExternalServices,GitHubAPI,GitHubRepo,VertexAI extbox
class Secrets,GHTokenSecret,GHAppSecret,GCloudSecret,WorkerCM,OapeCM secretbox
class RBAC,SA_Go,SA_Minter rbacbox
Key flows summarised
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy-v2/gh-token-minter-deployment.yaml`:
- Around line 67-70: Update the volume's Secret reference: in the deployment's
volumes entry for name "private-key" (the block where secret.secretName is set),
change secretName from "github-app-private-key" to "gh-app-secret" so the volume
mounts the actual Secret defined in secret-configs.yaml.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eb6897e7-f4c3-41fa-b1df-2acaaa0ad132
📒 Files selected for processing (4)
deploy-v2/gh-token-minter-deployment.yamldeploy-v2/go-server-deployment.yamldeploy-v2/kustomization.yamldeploy-v2/secret-configs.yaml
✅ Files skipped from review due to trivial changes (1)
- deploy-v2/secret-configs.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- deploy-v2/kustomization.yaml
| volumes: | ||
| - name: private-key | ||
| secret: | ||
| secretName: github-app-private-key |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Search for github-app-private-key Secret definition in the deploy-v2 directory
# Check for Secret named github-app-private-key in yaml files
rg -n "name:\s*github-app-private-key" --type yaml
# Also check for any Secret definitions that might use this name
rg -nB5 "kind:\s*Secret" --type yaml | rg -A5 "github-app-private-key"Repository: shiftweek/oape-ai-e2e
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Explore repository structure and search for Secret definitions and github-app references
# List files in deploy-v2 directory
echo "=== Files in deploy-v2 directory ==="
fd . deploy-v2 --type f | head -20
# Search for all Secret definitions (any kind)
echo -e "\n=== All Secret definitions ==="
rg "kind:\s*Secret" --type yaml -A 3
# Search for sealed-secret definitions
echo -e "\n=== Sealed Secret definitions ==="
rg "kind:\s*SealedSecret" --type yaml -A 3
# Broader search for github-app references
echo -e "\n=== All github-app references ==="
rg "github-app" --type yaml -n
# Check for secret-configs file mentioned in review
echo -e "\n=== Files matching secret-configs pattern ==="
fd "secret" deploy-v2 --type f
# Search for any file that might define secrets
echo -e "\n=== All yaml files in deploy-v2 ==="
fd "\.ya?ml$" deploy-v2 --type fRepository: shiftweek/oape-ai-e2e
Length of output: 1769
🏁 Script executed:
# View the secret-configs.yaml file to see complete Secret definitions
cat -n deploy-v2/secret-configs.yamlRepository: shiftweek/oape-ai-e2e
Length of output: 959
🏁 Script executed:
# View the gh-token-minter-deployment.yaml file to see the full context of the volume mounting
cat -n deploy-v2/gh-token-minter-deployment.yamlRepository: shiftweek/oape-ai-e2e
Length of output: 2296
Change secretName from github-app-private-key to gh-app-secret.
The Secret named github-app-private-key does not exist in the deployment. The actual Secret containing the GitHub App private key is named gh-app-secret (defined in secret-configs.yaml). Update line 70 to use the correct Secret name.
🧰 Tools
🪛 Checkov (3.2.513)
[medium] 32-70: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 32-70: Minimize the admission of root containers
(CKV_K8S_23)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy-v2/gh-token-minter-deployment.yaml` around lines 67 - 70, Update the
volume's Secret reference: in the deployment's volumes entry for name
"private-key" (the block where secret.secretName is set), change secretName from
"github-app-private-key" to "gh-app-secret" so the volume mounts the actual
Secret defined in secret-configs.yaml.
54de20f to
3892d58
Compare
- mv server agent Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
Summary by CodeRabbit
New Features
Infrastructure & Services