STF-322: Bounded transport-failure retry in WebServiceClient#693
STF-322: Bounded transport-failure retry in WebServiceClient#693
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient, defaulting to one retry for issues like connection resets or broken pipes. The implementation includes updates to the Builder for maxRetries configuration, detailed documentation in the README, and comprehensive test coverage for various failure scenarios. Feedback suggests refactoring the retry loop to prevent potential integer overflow and adopting case-insensitive string matching for exception messages to improve robustness across different environments.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient. It adds a maxRetries option to the builder (defaulting to 1), implements logic to retry on specific exceptions like connection resets and connect timeouts, and includes comprehensive tests to verify the behavior. Additionally, it adds tool configuration files for mise and updates the documentation and changelog. I have no feedback to provide as the changes are well-implemented and the existing review comments were purely evaluative.
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request introduces a transport-failure retry mechanism to the WebServiceClient, defaulting to one retry for transient errors like connection resets or broken pipes. It includes updates to the Builder for configuring maxRetries, expanded documentation in the README and CHANGELOG, and new test cases. Review feedback identifies a discrepancy in the retry logic regarding HttpConnectTimeoutException, which is currently excluded because it inherits from HttpTimeoutException. Additionally, a test case for connection timeouts was found to be misnamed and lacks verification that retries are actually being attempted.
I am having trouble creating individual review comments. Click here to see my feedback.
src/main/java/com/maxmind/geoip2/WebServiceClient.java (446-448)
There is a discrepancy between the code and the PR summary regarding HttpConnectTimeoutException. The summary states that connect timeouts are retried, but since HttpConnectTimeoutException extends HttpTimeoutException, it is currently excluded by this check. Additionally, the summary describes a "narrow" predicate that "walks the cause chain," whereas the implementation is broad, retrying all IOExceptions except for timeouts and interruptions. If the intention is to exclude all timeouts (as suggested by the README and CHANGELOG), the PR summary should be corrected. If connect timeouts should be retried, this logic needs to be adjusted to specifically allow them while still excluding request-phase timeouts.
src/test/java/com/maxmind/geoip2/WebServiceClientTest.java (590-608)
The test testRetriesOnConnectTimeout is misnamed and lacks verification of the retry logic. It triggers a ConnectException (connection refused) by connecting to a closed port, which is distinct from a "Connect Timeout" (HttpConnectTimeoutException). Furthermore, the test does not verify that a retry actually occurred (e.g., by checking the number of attempts). Consider renaming the test to reflect that it tests connection refusal and adding a verification step (such as using a mock or checking WireMock logs if applicable) to ensure the retry mechanism is triggered as expected.
f07a127 to
9991ece
Compare
When the JDK HttpClient pool reuses an idle connection that an intermediary (load balancer, proxy, NAT) has silently closed, the next send() fails with "Connection reset", "Broken pipe", or related transport errors. A single retry recovers transparently without exposing this race to callers. The default JDK keep-alive timeout exceeds many intermediaries' idle timeout, making this mismatch the common case. The retry predicate is permissive by exclusion: any IOException from httpClient.send() is retried EXCEPT HttpTimeoutException (covering both request-phase and connect-phase timeouts, since HttpConnectTimeoutException is a subclass) and InterruptedIOException. Both timeouts are customer-set budgets that retrying would silently extend; InterruptedIOException is a user-cancellation signal. HTTP 4xx and 5xx responses are surfaced as HttpException (and subclasses) from a separate code path -- they come back as HttpResponse objects rather than IOExceptions, so the predicate is structurally unable to retry them. Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total attempts). The interrupt flag is restored before rewrapping InterruptedException, and a pre-set interrupt short-circuits the predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing catch (InterruptedException) block in responseFor() rewraps into GeoIp2Exception without restoring the thread's interrupt status, silently swallowing the cancellation signal. Per Java's interruption protocol, code that catches InterruptedException without rethrowing it should re-set the flag so callers up the stack can observe the cancellation. This is an independent bug fix bundled into the STF-322 retry work because the retry feature exposes the path more often. Per project commit hygiene it lands as a separate commit so it can be cherry-picked or reverted on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover all 9 scenarios: connection-reset retry on country, city, and insights endpoints, no retry on HttpTimeoutException, retry on connect timeout (deterministic via a closed local ServerSocket), no retry on 4xx/5xx, .maxRetries(0) opt-out, and pre-interrupt short-circuit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors minfraud-api-java's mise.toml and mise.lock so Java and Maven versions are pinned and auto-installed on directory entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
WebServiceClient.Builder.maxRetries(int)(defaults to 1) so a single transient transport failure (Connection reset,Broken pipe, EOF, closed channel, etc.) is retried transparently.IOExceptionfromhttpClient.send()is retried EXCEPTHttpTimeoutException(which covers both request-phase and connect-phase timeouts, sinceHttpConnectTimeoutExceptionextends it) andInterruptedIOException. Both timeouts are customer-set budgets that retrying would silently extend;InterruptedIOExceptionis a user-cancellation signal we honor. HTTP 4xx/5xx come back asHttpResponseobjects (notIOExceptions) and are surfaced through the existing exception hierarchy, so the predicate is structurally unable to retry them.InterruptedExceptionrewrap path. Updates README and CHANGELOG; opt out via.maxRetries(0).mise.toml/mise.lockto pin the Java + Maven versions used locally, mirroringminfraud-api-java.Fixes STF-322. Parallel PR for
minfraud-api-javauses a byte-identical predicate (maxmind/minfraud-api-java#619).Test plan
mvn checkstyle:checkcleanmvn verify -Dgpg.skip=true— 102/102 pass (28 existing + 9 newWebServiceClientTestcases)jdk.httpclient.keepalive.timeoutas the complementary workaround🤖 Generated with Claude Code