Fix interfaces test to wait for replicating state and slot confirmation#360
Fix interfaces test to wait for replicating state and slot confirmation#360mason-sharp merged 1 commit intomainfrom
Conversation
danolivo
commented
Feb 24, 2026
- After sub_alter_interface, wait for the apply worker to reach replicating status instead of just checking it is not down. The previous check could pass while the worker was still in a transitional state (e.g., initializing), causing subsequent assertions to fail intermittently.
- Add wait_slot_confirm_lsn(NULL, NULL) on the provider before checking replication slot activity, ensuring all pending changes are flushed and preventing a race where the slot appears momentarily inactive.
📝 WalkthroughWalkthroughRevised SQL test to change replication status polling from Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/regress/sql/interfaces.sql (1)
65-73:⚠️ Potential issue | 🟠 MajorSecond wait loop was not updated — same early-exit race as the one fixed above
The PR objective is to wait until the worker reaches
'replicating'status, yet this loop (after reverting back to'test_provider') still exits as soon as status is no longer'down', leaving the same transitional-state race condition that motivated this PR.🔧 Proposed consistency fix
DO $$ BEGIN - WHILE EXISTS (SELECT 1 FROM spock.sub_show_status() WHERE status = 'down') + WHILE EXISTS (SELECT 1 FROM spock.sub_show_status() + WHERE status NOT IN ('replicating', 'disabled', 'apply_error')) LOOP -- TODO: The multimaster testing buildfarm should have a general parameter -- like 'wait_change_timeout' that may be used to take control over infinite -- waiting cycles. END LOOP; END;$$;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/interfaces.sql` around lines 65 - 73, The WHILE loop that queries spock.sub_show_status() currently exits as soon as status != 'down', which recreates the transitional-state race; change the loop condition to wait until the worker reaches 'replicating' (e.g., use WHILE NOT EXISTS (SELECT 1 FROM spock.sub_show_status() WHERE status = 'replicating') LOOP or WHILE EXISTS (... WHERE status <> 'replicating') LOOP) so the loop only finishes when status is explicitly 'replicating', updating the loop surrounding the revert-to-'test_provider' block accordingly.
🧹 Nitpick comments (1)
tests/regress/sql/interfaces.sql (1)
49-51: Redundant\c :provider_dsnat line 50Line 48 (unchanged) already reconnects to
:provider_dsn. The newly added line 50 reconnects to the same DSN immediately after, making it a no-op that adds noise.🧹 Proposed cleanup
\c :provider_dsn - -\c :provider_dsn SELECT spock.wait_slot_confirm_lsn(NULL, NULL);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/interfaces.sql` around lines 49 - 51, Remove the redundant reconnect command by deleting the extra "\c :provider_dsn" that immediately precedes the call to spock.wait_slot_confirm_lsn(NULL, NULL) in tests/regress/sql/interfaces.sql; if a different connection was intended, replace that line with the correct DSN, otherwise just remove the duplicate reconnect so the file only reconnects once to :provider_dsn before calling spock.wait_slot_confirm_lsn.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/regress/sql/interfaces.sql`:
- Around line 38-42: The loop condition using spock.sub_show_status() currently
checks WHERE status != 'replicating', which can hang if status becomes
'initializing' or 'disabled'; change those WHERE clauses (including the similar
one later) to explicitly only wait for the non-terminal 'down' state (e.g. WHERE
status = 'down' or WHERE status NOT IN
('replicating','initializing','disabled')) so the loop can exit on terminal
states; also remove the redundant connection directive "\c :provider_dsn" (the
earlier \c :provider_dsn already switches to the provider DSN).
---
Outside diff comments:
In `@tests/regress/sql/interfaces.sql`:
- Around line 65-73: The WHILE loop that queries spock.sub_show_status()
currently exits as soon as status != 'down', which recreates the
transitional-state race; change the loop condition to wait until the worker
reaches 'replicating' (e.g., use WHILE NOT EXISTS (SELECT 1 FROM
spock.sub_show_status() WHERE status = 'replicating') LOOP or WHILE EXISTS (...
WHERE status <> 'replicating') LOOP) so the loop only finishes when status is
explicitly 'replicating', updating the loop surrounding the
revert-to-'test_provider' block accordingly.
---
Nitpick comments:
In `@tests/regress/sql/interfaces.sql`:
- Around line 49-51: Remove the redundant reconnect command by deleting the
extra "\c :provider_dsn" that immediately precedes the call to
spock.wait_slot_confirm_lsn(NULL, NULL) in tests/regress/sql/interfaces.sql; if
a different connection was intended, replace that line with the correct DSN,
otherwise just remove the duplicate reconnect so the file only reconnects once
to :provider_dsn before calling spock.wait_slot_confirm_lsn.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/regress/expected/interfaces.outis excluded by!**/*.out
📒 Files selected for processing (1)
tests/regress/sql/interfaces.sql
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/regress/sql/interfaces.sql (1)
38-39:⚠️ Potential issue | 🟠 MajorGuard the “wait for replicating” loops with a timeout or explicit failure path.
Right now the loops at Line 38–39 and Line 65 can spin forever if the apply worker never reaches
replicating(e.g., stalled or disabled), which will hang the test indefinitely. Consider adding a short sleep plus a timeout (and/or raising an exception) so failures are explicit and bounded.🔧 Example timeout guard (apply to both loops)
DO $$ +DECLARE + start_ts timestamptz := clock_timestamp(); BEGIN WHILE EXISTS (SELECT 1 FROM spock.sub_show_status() WHERE status != 'replicating') LOOP + PERFORM pg_sleep(0.1); + IF clock_timestamp() - start_ts > interval '60 seconds' THEN + RAISE EXCEPTION 'Timed out waiting for subscription to reach replicating status'; + END IF; END LOOP; END;$$;Also applies to: 65-65
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/interfaces.sql` around lines 38 - 39, The WHILE loops that poll spock.sub_show_status() for status != 'replicating' should be bounded: modify the polling logic in the procedure containing the WHILE (the loop that calls spock.sub_show_status()) to record a start time (e.g., now()/clock_timestamp()), sleep briefly inside each iteration (e.g., pg_sleep), and if elapsed time exceeds a configurable timeout threshold then RAISE EXCEPTION (or RETURN/FAIL the test) so the test does not spin forever; apply the same timeout/sleep pattern to both occurrences that poll spock.sub_show_status() so failures are explicit and bounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/regress/sql/interfaces.sql`:
- Around line 38-39: The WHILE loops that poll spock.sub_show_status() for
status != 'replicating' should be bounded: modify the polling logic in the
procedure containing the WHILE (the loop that calls spock.sub_show_status()) to
record a start time (e.g., now()/clock_timestamp()), sleep briefly inside each
iteration (e.g., pg_sleep), and if elapsed time exceeds a configurable timeout
threshold then RAISE EXCEPTION (or RETURN/FAIL the test) so the test does not
spin forever; apply the same timeout/sleep pattern to both occurrences that poll
spock.sub_show_status() so failures are explicit and bounded.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/regress/expected/interfaces.outis excluded by!**/*.out
📒 Files selected for processing (1)
tests/regress/sql/interfaces.sql