test(refactor): add withToxiproxy* in withTools by mkleczek · Pull Request #4833 · PostgREST/postgrest

mkleczek · 2026-04-21T05:52:59Z

This commit introduces toxiproxy in withTools adding the following helpers:

withToxiproxyServer
withToxiproxyProxy
withToxiproxyPgProxy

wolfgangwalther

Just some comments about the code while I am reading through and making myself familiar with toxiproxy.

I have not reached a conclusion about higher level design questions, yet - so don't take my current review as an approval of the general approach, yet :). Will need to get familiar with the tool, the follow up PRs and the bigger picture first.

wolfgangwalther · 2026-04-29T08:57:35Z

+        proxy_port=''$(${randomPort})
+
+        ${withToxiproxyServer} ${withToxiproxyProxy} -l "$TCP_PGHOST:$proxy_port" -u "$TCP_PGHOST:$PGPORT" \
+          env "TOXI_PGPORT=$proxy_port" "$_arg_command" "''${_arg_leftovers[@]}"


I think a better user-interface would be something where PGPORT was replaced with the proxy port, so that downstream consumers don't need to know whether they need to connect to TOXI_PGPORT or to PGPORT.

The addition of postgrest-with-toxiproxy-xxx should ideally be transparent.

WDYT?

Good point - if we don't need unix socket access then I agree.

wolfgangwalther · 2026-04-29T09:00:11Z

+          UNIX_PGHOST="$PGHOST"
+          export TCP_PGHOST="localhost"


I'm not sure whether we still need to keep unix sockets here at all. We originally did this to prevent port-in-use errors during tests. We now need to add the complexity of finding the right port etc. - so there is no point in keeping additional complexity in supporting both unix sockets and tcp.

I'd agree. The only thing is load tests and the performance impact (if any) of running them with TCP.

Why don't we create another dedicated test suite like postgrest-test-recovery (suggested before here) only focused on TCP? That way we don't have to modify previous test infra?

Why don't we create another dedicated test suite like postgrest-test-recovery (suggested before here) only focused on TCP?

Fine for me.
In #4860 I've extracted a (private) library test-utils that should be shared across different test suites and will make creating additional test suites easier.
But I don't think it is relevant to this PR really (see below).

That way we don't have to modify previous test infra?

This PR does not really change anything in existing infra (except starting the db server on TCP in addition to unix socket, which required very small adjustments in tests - adding explicit PGPORT)

except starting the db server on TCP in addition to unix socket, which required very small adjustments in tests - adding explicit PGPORT

That looks like a change, unless if it was gated by bool env var like:

postgrest/nix/tools/withTools.nix

Line 33 in 08ec8d1

"ARG_OPTIONAL_BOOLEAN([replica],, [Enable a replica for the database])"

Why don't we create another dedicated test suite like postgrest-test-recovery (suggested before here) only focused on TCP?

Essentially, I conclude the same in #4868. I would not call it "recovery", see my reasoning in that issue. The key thing is: I don't want to run TCP connections on the host system, because that just leads to port conflicts down the road - we've always had these eventually.

In #4860 I've extracted a (private) library test-utils that should be shared across different test suites and will make creating additional test suites easier.
But I don't think it is relevant to this PR really (see below).

See that same issue above why I think what we're currently doing in the observability test suite is fundamentally wrong - and as such that PR is, too. We can't really use haskell code to "inspect" PostgREST, because we'd always only test the library, not the executable in its entirety. Which is fine for some things, but surely not for listener related things.

See that same issue above why I think what we're currently doing in the observability test suite is fundamentally wrong - and as such that PR is, too. We can't really use haskell code to "inspect" PostgREST, because we'd always only test the library, not the executable in its entirety. Which is fine for some things, but surely not for listener related things.

That's obviously not true, see for example https://github.com/mkleczek/postgrest/blob/9af29496ad943381233d2752ca8f6dd6c10487c4/test/observability/Observation/ToxiSpec.hs#L43

I'm not sure what the link is supposed to tell me, but it's certainly not a counter point to my argument. Just because you are testing the listener there, doesn't invalidate my point which is:

We will very likely want to split library from executable.

Listener should be in the executable (even if it was technically implemented as a cabal library, that's not the point)

You can't test the full executable by referencing it as a library - and we certainly are not doing so right now, because the observability tests are setting the config via Haskell, for example.

Once you run the executable, you can't use haskell code to inspect it anymore.

Your tests are not running the executable, they are using the still-not-split postgrest as a library.

I'm not sure what the link is supposed to tell me, but it's certainly not a counter point to my argument. Just because you are testing the listener there, doesn't invalidate my point which is:

We will very likely want to split library from executable.

Then it is even more important to test the library, isn't it?

Listener should be in the executable (even if it was technically implemented as a cabal library, that's not the point)

I am not sure I understand that. Listener is both a library and in the executable.

You can't test the full executable by referencing it as a library - and we certainly are not doing so right now, because the observability tests are setting the config via Haskell, for example.

Once you run the executable, you can't use haskell code to inspect it anymore.

Your tests are not running the executable, they are using the still-not-split postgrest as a library.

I know that, but I don't understand why it would invalidate the tests. There are various kinds of tests necessary to ensure high quality: unit tests, white box tests, black box tests - all with different trade-offs.

You could use the same argument for postgrest-test-spec - they don't test the whole executable. It does not make them invalid or not useful.

And again - there are good arguments for writing tests in Haskell - that already was discussed in #4671.

I think this discussion is probably better off in #4868, because many of the points you are addressing here, I already did there.

This commit introduces toxiproxy in withTools adding the following helpers: * withToxiproxyServer * withToxiproxyProxy * withToxiproxyPgProxy

wolfgangwalther · 2026-05-03T15:28:30Z

I have not reached a conclusion about higher level design questions, yet

I have thought about this a bit more now, see #4868. I don't think withToxiproxy is the right thing to do, really. That's because whether or not toxiproxy is running or not is really a fundamental requirement for whatever test suite runs under it: The test suite doesn't work without it, but there is also no replacement that can be used instead of withToxiproxy. Note how the with-pg-xx helpers provide different versions of PG to then run the same thing against them. Or, previously, the withPg and withSlowPg or so (can't remember the exact names) - they did not provide a tool, but a condition.

That's what these helpers are for, to provide different conditions. Not to provide an environment.

Toxiproxy is different, because we need to provide an environment with toxiproxy available - and that always needs to be the same if the tests make use of it. So we need to solve this on the "environment" level - and first discuss how we'd like to set up our environment in general. I mentioned virtual machines / NixOS VM tests before, you mentioned containers. I wrote some things up in #4868, but they don't really touch on the details, yet. Some rough brainstorming for which kinds of things we'll want to do with the environment:

Run PostgreSQL, obviously. We also need this for all the single-executable or even library tests.
Run PostgreSQL in a replicated setup.
Run nginx or other reverse proxies, to possibly test interactions.
Run toxiproxy, at least between pgrst and pg, but possibly between nginx and pgrst or the client and pgrst/nginx as well.
Do stuff with TCP explicitly, while running the things on the host system with unix sockets (as discussed elsewhere).
... and probably a lot more

Still have to think more about this, but at this stage I'm confident we'll not want withToxiproxy.

mkleczek · 2026-05-03T17:01:11Z

Still have to think more about this, but at this stage I'm confident we'll not want withToxiproxy.

Sidestepping a little the discussion about any future directions, IMHO you are going into the territory of "make better the enemy of the good". The point is that we don't have any infrastructure to test recovery. And just writing a simple "showcase" test allowed to spot #4681 .

I am not sure if what you oppose is withToxiProxy toolset or use of Toxiproxy itself. If it is the first I am happy to implement something different - the question is: what? If it is the second then what alternatives do we have? So far we have some vague ideas about VM based testing environment but I don't know how fast (and if at all) we can come up with such an infrastructure. I am also pretty sure once we have it - we can easily migrate to it.

I also quite don't understand what you are saying about postgrest-with-* scripts not providing environment. For example postgrest-with-pg starts the database server, posgtrest-with-postgrest starts PostgREST itself. How is it different from postgrest-with-toxiproxy starting Toxiproxy?

wolfgangwalther · 2026-05-03T17:30:03Z

I am not sure if what you oppose is withToxiProxy toolset or use of Toxiproxy itself.

The first.

If it is the first I am happy to implement something different - the question is: what?

I am still working on that part.

How is it different from postgrest-with-toxiproxy starting Toxiproxy?

You can do postgrest-with-pg-18 postgrest-test-spec, postgrest-with-pg-17 postgrest-test-spec, and so on.

What alternatives do you have to postgrest-with-toxiproxy postgrest-test-which-needs-toxiproxy? You don't have any.

If we were using such a with-xxx helper, then it would roughly need to satisfy the following conditions to be useful:

It would need to be transparent to the test script on whether we're running with or without toxiproxy,
We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss, ... things like these. Totally made up, but hopefully they make it a tad clearer what I mean by "condition" instead of "tool".

Of course these scripts provide some kind of environment, they need to by the nature of how they work. But they are really bad at providing an isolated environment, which is really what we need for any of these.

wolfgangwalther · 2026-05-03T17:35:07Z

(there is one other option of when the with helpers are useful, which I did not consider before: If we're reusing that part of code for multiple test-suites - however, I don't see us using toxiproxy across multiple test-suites)

mkleczek · 2026-05-03T17:55:50Z

If we were using such a with-xxx helper, then it would roughly need to satisfy the following conditions to be useful:

It would need to be transparent to the test script on whether we're running with or without toxiproxy,

But it is!

postgrest-with-pg postgrest-with-toxiproxy /bin/sh -c "$PGPORT=$TOXI_PGPORT <run-your-test-suite>"

The discussion whether withToxiProxyPg should set PGPORT is a separate discussion but once it does - it is absolutely orthogonal.

We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss, ... things like these. Totally made up, but hopefully they make it a tad clearer what I mean by "condition" instead of "tool".

I agree here and that was my idea - not sure if they should be added now though.

OTOH you also need to have the possibility to manipulate Toxiproxy inside the tests to make changing network conditions part of the test scenario - for hspec I raised #4836

wolfgangwalther · 2026-05-03T18:14:24Z

OTOH you also need to have the possibility to manipulate Toxiproxy inside the tests to make changing network conditions part of the test scenario

Yeah, I had assumed that to be the basic pre-requisite to make toxiproxy useful. That's what I mean by not being transparent to the test. Once you make use of toxiproxy in a test-suite, it becomes a requirement.

I agree here and that was my idea - not sure if they should be added now though.

Then maybe I don't understand how these are supposed to be used, yet. My understanding was that these conditions are to be managed by each test case, not by the environment for the whole test suite. Isn't that the great advantage of toxiproxy over something like slocat?

Once they are managed by each test case we will very quickly come up with the requirement to isolate proxies for each test case, so that they can still run in parallel, without affecting each other. At that point, a postgrest-with-toxiproxy-xxx will not even work anymore. So we will end up spinning up proxies and stuff from the test framework, I believe.

mkleczek · 2026-05-03T18:26:24Z

OTOH you also need to have the possibility to manipulate Toxiproxy inside the tests to make changing network conditions part of the test scenario

Yeah, I had assumed that to be the basic pre-requisite to make toxiproxy useful. That's what I mean by not being transparent to the test. Once you make use of toxiproxy in a test-suite, it becomes a requirement.

I agree here and that was my idea - not sure if they should be added now though.

Then maybe I don't understand how these are supposed to be used, yet. My understanding was that these conditions are to be managed by each test case, not by the environment for the whole test suite. Isn't that the great advantage of toxiproxy over something like slocat?

So really - there is a couple important but separate things to consider:

Prerequisite: start Postgres on TCP socket
Provide Toxiproxy the tool in our Nix environment
Provide a way to start Toxiproxy
Provide means for tests to manipulate proxies
Provide means to transparently inject "slow" toxic (in place of slocat) to load tests - important once we move to pipelining

To be honest - this PR was started when slocat was still present and was supposed to replace it and change as little as possible.

I get where you are coming from and I am happy to restrict the scope of this PR to starting Pg on TCP socket and providing Toxiproxy the tool. Or the whole thing should be moved to #4836.

wolfgangwalther · 2026-05-03T18:43:01Z

Provide means to transparently inject "slow" toxic (in place of slocat) to load tests - important once we move to pipelining

I disagree on that part. The load tests are not supposed to be used for these kind of tests. These kind of things don't need to be tested on every commit. The load tests are meant to test haskell code, not general architecture of how PostgREST runs within certain environment. Looking at the table in #4868, we will need toxiproxy for the "Environment / Correctness" part and maybe for the "Environment / Performance" part - and setting it up in postgrest-benchmark is entirely separate.

Yes, we would certainly want to test the effects of pipeling on performance. But not with our loadtests.

Provide means for tests to manipulate proxies

This is the most critical piece for me right now. As discussed earlier, I'm not convinced that we'll want to hspec based tests for the new test-suite. This is the library/executable discussion. I actually think from the perspective of the test runner, nothing is wrong with our pytest based setup. The thing we need to discuss is: How are we going to inspect the running PostgREST executable without looking at unstructured logs? This is independent from which test-runner is used. It could be hspec, it could be pytest - doesn't matter. But we shouldn't be using hspec just to be able to compile against the postgrest library.

Once we solve this piece and know where we want to go, we can start providing the right tools for that test framework to work well with toxiproxy.

Of course this interacts with how we provide what I call the "environment", i.e. various tools all running at the same time and us being able to manage them well. Since I'm not so convinced about using containers, I am thinking primarily about how to set these environments up in a NixOS VM test - and then do one of:

Use the NixOS test driver to run these kinds of tests, or
Just run whatever test framework we're running with (could be pytest, could be hspec) inside the VM on one of the machines

That's the two things I am working on right now.

steve-chavez · 2026-05-04T15:09:22Z

We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss

FWIW, on postgrest-benchmark I manage variants with env vars (ref). My idea when proposing the new postgrest-test-recovery test suite before, was that we would add io tests (pytest) by customizing the proxy failure states with the env vars, instead of new postgrest-with-* scripts. Maybe something similar can be done with toxiproxy?

mkleczek · 2026-05-04T15:43:07Z

We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss

FWIW, on postgrest-benchmark I manage variants with env vars (ref). My idea when proposing the new postgrest-test-recovery test suite before, was that we would add io tests (pytest) by customizing the proxy failure states with the env vars, instead of new postgrest-with-* scripts. Maybe something similar can be done with toxiproxy?

I am not sure what your idea is but Toxiproxy is a server that you can manipulate in runtime with REST API. So you can either use your language's REST client to inject "toxics" or you can use toxiproxy-cli command line tool to do that (it is a thin wrapper around the REST API).

mkleczek mentioned this pull request Apr 21, 2026

test(refactor): Add Toxiproxy Hspec test infrastructure #4836

Open

mkleczek force-pushed the with-tools-toxiproxy branch 4 times, most recently from da857ef to 4edd002 Compare April 21, 2026 12:21

mkleczek requested review from steve-chavez, taimoorzaeem and wolfgangwalther April 21, 2026 13:36

mkleczek mentioned this pull request Apr 21, 2026

fix: implement connection tracking in metrics #4672

Open

mkleczek force-pushed the with-tools-toxiproxy branch from 4edd002 to 3e5c575 Compare April 24, 2026 15:28

mkleczek mentioned this pull request Apr 27, 2026

Missing tests - recovery tests #1766

Open

mkleczek force-pushed the with-tools-toxiproxy branch 2 times, most recently from 5232a7a to 03abf8d Compare April 28, 2026 11:17

wolfgangwalther reviewed Apr 29, 2026

View reviewed changes

mkleczek force-pushed the with-tools-toxiproxy branch 8 times, most recently from 6cf4996 to 0077aa2 Compare May 1, 2026 05:54

test(refactor): add withToxiproxy* in withTools

2783535

This commit introduces toxiproxy in withTools adding the following helpers: * withToxiproxyServer * withToxiproxyProxy * withToxiproxyPgProxy

mkleczek force-pushed the with-tools-toxiproxy branch from 0077aa2 to 2783535 Compare May 3, 2026 12:11

Uh oh!

Conversation

mkleczek commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolfgangwalther commented May 3, 2026

Uh oh!

mkleczek commented May 3, 2026

Uh oh!

wolfgangwalther commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented May 3, 2026

Uh oh!

mkleczek commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented May 3, 2026

Uh oh!

mkleczek commented May 3, 2026

Uh oh!

wolfgangwalther commented May 3, 2026

Uh oh!

steve-chavez commented May 4, 2026

Uh oh!

mkleczek commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

mkleczek commented Apr 21, 2026 •

edited

Loading

wolfgangwalther commented May 3, 2026 •

edited

Loading

mkleczek commented May 3, 2026 •

edited

Loading