Conversation
…utine Replace panic in GraphqlAsyncClient rate-limit polling goroutine with graceful error handling. Previously, any error while fetching rate limit (e.g., transient network issues or 401 responses) would trigger a panic inside a background goroutine, crashing the entire DevLake process. Now, errors are logged and the client retries in the next cycle while retaining the last known rate limit. Design decisions: - Avoid panic in background goroutines: rate-limit polling is non-critical and should not bring down the entire pipeline. - Use last known rateRemaining on runtime failures instead of resetting or blocking, ensuring continued progress with eventual consistency. - Retry via existing polling mechanism instead of immediate retry to prevent tight retry loops and unnecessary API pressure. - Introduce a default fallback (5000) only for initial rate-limit fetch failures, since no prior state exists at startup. - Separate handling of initial vs runtime failures: - Initial failure → fallback to default (5000) - Runtime failure → retain previous value Fixes apache#8788 (bug 1)
…token refresh Replace oauth2.StaticTokenSource-based HTTP client with the underlying http.Client from ApiAsyncClient. Previously, the GraphQL client constructed its own HTTP client using StaticTokenSource, which froze the access token at task start time. This caused GitHub App installation tokens (which expire after ~1 hour) to become invalid during long-running pipelines, leading to persistent 401 errors. Now, the GraphQL client reuses apiClient.GetClient(), which is already configured with RefreshRoundTripper and TokenProvider. This enables automatic token refresh on 401 responses, aligning GraphQL behavior with the REST client. Design decisions: - Reuse transport layer instead of duplicating authentication logic to ensure consistency across REST and GraphQL clients. - Avoid StaticTokenSource, as it prevents token refresh and breaks long-running pipelines. - Leverage existing RefreshRoundTripper for transparent token rotation without modifying GraphQL query logic. - Keep protocol-specific logic (GraphQL vs REST) separate while sharing the underlying HTTP transport. This ensures GraphQL pipelines using GitHub App authentication can run beyond token expiry without failure. Fixes apache#8788 (bug 2)
…lient - moved token provider and refresh round tripper setup into a reusable helper - introduced CreateAuthenticatedHttpClient to centralize auth + transport logic - updated CreateApiClient to use shared http client instead of inline setup Rationale: - decouples authentication (transport layer) from REST-specific client logic - enables reuse for GraphQL client without duplicating token refresh logic - aligns architecture with separation of concerns (http transport vs api clients)
…ntegrate into task flow - added CreateGraphqlClient to encapsulate graphql client construction - reused CreateAuthenticatedHttpClient from github/tasks to inject token refresh via RoundTripper - replaced manual graphql client setup in PrepareTaskData with new factory function - preserved existing rate limit handling via getRateRemaining callback - preserved query cost calculation using SetGetRateCost Technical details: - graphql client now uses http transport with TokenProvider and RefreshRoundTripper - removes dependency on oauth2 client and avoids token expiration issues - decouples graphql client from REST ApiClient by avoiding reuse of apiClient.GetClient() - maintains compatibility with github.com and enterprise graphql endpoints Note: - shared auth logic remains in github/tasks and is imported with alias to avoid package name collision - introduces cross-plugin dependency (github_graphql → github/tasks) as a pragmatic tradeoff to avoid duplication
…ents add StaticRoundTripper for PAT authentication and use it in the shared http client. since the same client is used by both REST and GraphQL, auth handling must distinguish between refreshable tokens and static tokens. avoid applying refresh/retry logic to PAT. ensures correct behavior across clients and prevents unnecessary retries for static auth.
| getRateCost func(q interface{}) int | ||
| } | ||
|
|
||
| const defaultRateLimit = 5000 |
There was a problem hiding this comment.
Where does this constant come from? Could you add a source link for reference, or should we make it configurable if it varies?
There was a problem hiding this comment.
Currently it falls back to 5000 (GitHub’s standard limit) when rate info isn’t available. I agree this shouldn’t be hardcoded—I'll make it configurable so it can adapt to different scenarios.
…e limit Implement a layered fallback mechanism for GraphQL rate limiting: 1. Dynamic rate limit from provider (getRateRemaining) 2. Per-client override (WithFallbackRateLimit) 3. Config override (GRAPHQL_RATE_LIMIT) 4. Default fallback (1000) Also moved GitHub-specific fallback (5000) via WithFallbackRateLimit to the Graphql client.
|
@klesh I’ve pushed an update for this—could you please take a look? Happy to make changes if needed. |
danielemoraschi
left a comment
There was a problem hiding this comment.
Left some comments for your review. Thanks!
| ) (*helper.GraphqlAsyncClient, errors.Error) { | ||
|
|
||
| // inject the shared auth layer | ||
| httpClient, err := githubTasks.CreateAuthenticatedHttpClient(taskCtx, connection, nil) |
There was a problem hiding this comment.
Calling this with nil as the base client I believe will default to an empty http.Client. This ignores the proxy configuration (connection.GetProxy()) and abandons the HTTP/SOCKS5 Proxy logic associated with the connection. It might break enterprise proxy setups as the proxy configuration is never attached, while the previous implementation explicitly injected a custom proxy http.Transport in backend/plugins/github_graphql/impl/impl.go into the GraphQL HTTP client. This refactoring drops that setup for corporate proxy users.
| connection.AuthMethod, | ||
| ) | ||
|
|
||
| } else if connection.Token != "" { |
There was a problem hiding this comment.
connection.Token can contain multiple comma-separated tokens (e.g. tok1,tok2,tok3). The old code in impl.go split on comma and used only the first token (tokens[0]) via oauth2.StaticTokenSource. This passes connection.Token directly to StaticRoundTripper, which injects the entire unsplit string as Authorization: Bearer tok1,tok2,tok3. On the REST path, this also overwrites the correctly-rotated single token set by SetupAuthentication at the request level.
| logger.Info("Persisted initial token for connection %d", connection.ID) | ||
| } | ||
| } | ||
| println("http client HIT3") |
There was a problem hiding this comment.
Leftover development debug print statement deployed in production code.
| } | ||
|
|
||
| // resolveRateLimit determines the rate limit for GraphQL requests using task configuration -> else default constant. | ||
| func resolveRateLimit(taskCtx plugin.TaskContext, logger log.Logger) int { |
There was a problem hiding this comment.
resolveRateLimit reads env config at line 69, then opts (including WithFallbackRateLimit(5000)) are applied at lines 82-84, unconditionally overwriting the env value. Wouldn't a user setting GRAPHQL_RATE_LIMIT=2000 be silently overridden by the code-level "fallback" of 5000 from graphql_client.go:69?
| logger.Info(`github graphql rate limit are disabled, fallback to 5000req/hour`) | ||
| return 5000, nil, nil | ||
| logger.Info(`github graphql rate limit unavailable, using fallback rate limit`) | ||
| return 0, nil, errors.Default.New("rate limit unavailable") |
There was a problem hiding this comment.
The old code returned 5000, nil, nil for GitHub Enterprise with rate limiting disabled. This now returns an error instead of the previous 5000. The error propagates so the functionality is preserved, but every task run against such servers now produces a warning for what was previously a silently handled normal case.
Summary
This PR fixes two issues in the GitHub GraphQL client related to stability and authentication during long-running pipelines:
Prevent process crash on rate-limit polling errors
panicinGraphqlAsyncClientbackground goroutine with graceful error handling.Enable automatic token refresh for GraphQL requests
oauth2.StaticTokenSource-based HTTP client with the underlyinghttp.ClientfromApiAsyncClient.RefreshRoundTripper, enabling automatic token refresh for GitHub App installation tokens.Refactor authentication into shared HTTP client
CreateAuthenticatedHttpClient.Introduce GraphQL client factory
CreateGraphqlClientusing shared authenticated HTTP client.Support static and refreshable tokens
StaticRoundTripperfor PAT authentication.Does this close any open issues?
Closes #8788
Screenshots
N/A
Other Information
Design decisions
Avoid panic in background goroutines
Rate-limit polling is non-critical and should not crash the entire pipeline. Errors are now handled gracefully.
Separate handling of initial vs runtime failures
5000) to prevent deadlock.rateRemainingto preserve correctness.Use eventual consistency for rate limiting
Rate-limit polling is best-effort; failures should not block or terminate execution.
Centralize transport layer for consistency
Both GraphQL and REST clients now use a shared authenticated HTTP client, ensuring consistent authentication, proxy, and retry behavior.
Avoid
StaticTokenSourceStatic tokens break long-running pipelines due to expiration. Using
RefreshRoundTripperensures seamless token rotation.Handle static vs refreshable tokens explicitly
Shared transport distinguishes between PAT (static) and installation (refreshable) tokens to avoid incorrect refresh/retry behavior.