Skip to content

fix: Start listening after schema cache load#4880

Open
mkleczek wants to merge 1 commit intoPostgREST:mainfrom
mkleczek:push-tynrmqwlwwus
Open

fix: Start listening after schema cache load#4880
mkleczek wants to merge 1 commit intoPostgREST:mainfrom
mkleczek:push-tynrmqwlwwus

Conversation

@mkleczek
Copy link
Copy Markdown
Collaborator

@mkleczek mkleczek commented May 5, 2026

DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

This change ensures PostgREST starts listening on a server socket only after it loaded the schema cache and is ready to handle requests. It is no longer going to return 503 errors during startup until the schema cache is loaded.

Comment thread CHANGELOG.md
- Restore Listener query shape so it can be found in pg_stat_activity by @mkleczek in #4857 #4859
- The LISTEN channel now automatically recovers when it stops working due to a PostgreSQL bug @laurenceisla in #3147
- Fix misleading "Functions" name on schema cache summary in startup logs by @taimoorzaeem in #4821
- Do not listen on main socket until schema cache is loaded by @mkleczek in #4880
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also have a ### Changed entry, let's put it as breaking change to be extra safe.

Comment thread src/PostgREST/Admin.hs
Comment on lines -47 to +48
status | not isMainAppReachable = HTTP.status500
| isPending = HTTP.status503
status | isPending = HTTP.status503
| not isMainAppReachable = HTTP.status500
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change is not needed? Results in same behavior?

Copy link
Copy Markdown
Collaborator Author

@mkleczek mkleczek May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change is not needed? Results in same behavior?

I think it is needed: we want to return 503 during schema cache loading when main app is still not reachable.

Comment thread src/PostgREST/App.hs
Comment on lines +99 to +100
mainSocket <- initServerSocket conf
atomicWriteIORef mainSocketRef $ Just mainSocket
Copy link
Copy Markdown
Member

@steve-chavez steve-chavez May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Can this change help lift the limitation we have with the --ready check? (#4269 (comment)) i.e. can the admin server know the address that is resolved for the API server when using special hosts? (like !4)

@steve-chavez
Copy link
Copy Markdown
Member

steve-chavez commented May 5, 2026

This should close #4129. Previous discussion on the motivation of the change on #4703 (comment).

@steve-chavez steve-chavez requested a review from laurenceisla May 5, 2026 16:38
@steve-chavez
Copy link
Copy Markdown
Member

@mkleczek As per #4703 (comment), this would clearly benefit the case of SO_REUSEPORT given 2 PostgREST instances running.

But let's consider the scenario of a single instance managed by systemd behind a proxy:

  • The service restarts (for any reason, could be manual restart because somehow the schema cache failed reloading).
  • Right now our startup is fast (milliseconds) and we start responding with 503s. During this time clients will get the 503s with a clear error message that says we're "Retrying.." plus a Retry-After header.

With this change, now we'll not respond and clients will get a connection refused with no error message. And this state can last multiple seconds now that we wait for the scache to load.

So under this scenario, it looks like this new behavior is worse?


Thinking more what we need is zero-downtime restarts, which I guess is easier under this new behavior since we could rely on systemd socket activation?

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented May 5, 2026

So under this scenario, it looks like this new behavior is worse?

Not really.

From the point of view of the client there is not much gain from these 503 errors comparing to some connect timeout or similar. The client has to handle connection issues anyway because there are many more cases when they can happen (for example the whole server might have crashed). In case of reverse proxies in front of PostgREST (ie. always) - the client will get some 50x anyway.
The only reasonable way for the client to handle network issues is to retry. Well behaving clients will use some exponential backoff with jitter retry policy to avoid overwhelming freshly started server (ie. to avoid thundering herd).
Retry-After is not very useful because it is not reliable. What's worse: if all clients retry according to this header then... boom - thundering herd - I would even say Retry-After is more harmful than good.

Comment thread test/io/test_io.py
Comment on lines -1788 to +1769
response = postgrest.session.get("/projects")
assert response.status_code == 503
with pytest.raises(requests.ConnectionError):
postgrest.session.get("/projects")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, the test description does not fit anymore.

But anyway, with the change in this PR, the test case does not make sense anymore at all, I think. A connection error does not even make it to PostgREST, so testing its non-effect is useless. I'd remove that test-case as well, if we decide to go on with this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure about that - we want to show that schema cache is loaded once during startup.
We also verify the behavior during schema cache load (before it was 503 now it is connection error).
I also added another assertion that once schema cache is loaded we return 200 from /projects request.

Test description updated - it was indeed wrong.

This change ensures PostgREST starts listening on a server socket only after it loaded the schema cache and is ready to handle requests. It is no longer going to return 503 errors during startup until the schema cache is loaded.
@mkleczek mkleczek force-pushed the push-tynrmqwlwwus branch from 5bb48c5 to 99554cb Compare May 5, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants