Skip to content

test(refactor): add withToxiproxy* in withTools#4833

Open
mkleczek wants to merge 1 commit intoPostgREST:mainfrom
mkleczek:with-tools-toxiproxy
Open

test(refactor): add withToxiproxy* in withTools#4833
mkleczek wants to merge 1 commit intoPostgREST:mainfrom
mkleczek:with-tools-toxiproxy

Conversation

@mkleczek
Copy link
Copy Markdown
Collaborator

@mkleczek mkleczek commented Apr 21, 2026

This commit introduces toxiproxy in withTools adding the following helpers:

  • withToxiproxyServer
  • withToxiproxyProxy
  • withToxiproxyPgProxy

Copy link
Copy Markdown
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments about the code while I am reading through and making myself familiar with toxiproxy.

I have not reached a conclusion about higher level design questions, yet - so don't take my current review as an approval of the general approach, yet :). Will need to get familiar with the tool, the follow up PRs and the bigger picture first.

Comment thread nix/tools/withTools.nix Outdated
Comment thread nix/tools/withTools.nix Outdated
Comment thread nix/tools/withTools.nix Outdated
Comment thread nix/tools/withTools.nix Outdated
Comment thread nix/tools/withTools.nix Outdated
Comment thread nix/tools/withTools.nix Outdated
Comment thread nix/tools/withTools.nix Outdated
Comment thread nix/tools/withTools.nix
Comment thread nix/tools/withTools.nix
Comment on lines +442 to +445
proxy_port=''$(${randomPort})

${withToxiproxyServer} ${withToxiproxyProxy} -l "$TCP_PGHOST:$proxy_port" -u "$TCP_PGHOST:$PGPORT" \
env "TOXI_PGPORT=$proxy_port" "$_arg_command" "''${_arg_leftovers[@]}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better user-interface would be something where PGPORT was replaced with the proxy port, so that downstream consumers don't need to know whether they need to connect to TOXI_PGPORT or to PGPORT.

The addition of postgrest-with-toxiproxy-xxx should ideally be transparent.

WDYT?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - if we don't need unix socket access then I agree.

Comment thread nix/tools/withTools.nix
Comment on lines +75 to +76
UNIX_PGHOST="$PGHOST"
export TCP_PGHOST="localhost"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we still need to keep unix sockets here at all. We originally did this to prevent port-in-use errors during tests. We now need to add the complexity of finding the right port etc. - so there is no point in keeping additional complexity in supporting both unix sockets and tcp.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree. The only thing is load tests and the performance impact (if any) of running them with TCP.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we create another dedicated test suite like postgrest-test-recovery (suggested before here) only focused on TCP? That way we don't have to modify previous test infra?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we create another dedicated test suite like postgrest-test-recovery (suggested before here) only focused on TCP?

Fine for me.
In #4860 I've extracted a (private) library test-utils that should be shared across different test suites and will make creating additional test suites easier.
But I don't think it is relevant to this PR really (see below).

That way we don't have to modify previous test infra?

This PR does not really change anything in existing infra (except starting the db server on TCP in addition to unix socket, which required very small adjustments in tests - adding explicit PGPORT)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except starting the db server on TCP in addition to unix socket, which required very small adjustments in tests - adding explicit PGPORT

That looks like a change, unless if it was gated by bool env var like:

"ARG_OPTIONAL_BOOLEAN([replica],, [Enable a replica for the database])"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we create another dedicated test suite like postgrest-test-recovery (suggested before here) only focused on TCP?

Essentially, I conclude the same in #4868. I would not call it "recovery", see my reasoning in that issue. The key thing is: I don't want to run TCP connections on the host system, because that just leads to port conflicts down the road - we've always had these eventually.

In #4860 I've extracted a (private) library test-utils that should be shared across different test suites and will make creating additional test suites easier.
But I don't think it is relevant to this PR really (see below).

See that same issue above why I think what we're currently doing in the observability test suite is fundamentally wrong - and as such that PR is, too. We can't really use haskell code to "inspect" PostgREST, because we'd always only test the library, not the executable in its entirety. Which is fine for some things, but surely not for listener related things.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See that same issue above why I think what we're currently doing in the observability test suite is fundamentally wrong - and as such that PR is, too. We can't really use haskell code to "inspect" PostgREST, because we'd always only test the library, not the executable in its entirety. Which is fine for some things, but surely not for listener related things.

That's obviously not true, see for example https://github.com/mkleczek/postgrest/blob/9af29496ad943381233d2752ca8f6dd6c10487c4/test/observability/Observation/ToxiSpec.hs#L43

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the link is supposed to tell me, but it's certainly not a counter point to my argument. Just because you are testing the listener there, doesn't invalidate my point which is:

  • We will very likely want to split library from executable.
  • Listener should be in the executable (even if it was technically implemented as a cabal library, that's not the point)
  • You can't test the full executable by referencing it as a library - and we certainly are not doing so right now, because the observability tests are setting the config via Haskell, for example.
  • Once you run the executable, you can't use haskell code to inspect it anymore.

Your tests are not running the executable, they are using the still-not-split postgrest as a library.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the link is supposed to tell me, but it's certainly not a counter point to my argument. Just because you are testing the listener there, doesn't invalidate my point which is:

  • We will very likely want to split library from executable.

Then it is even more important to test the library, isn't it?

  • Listener should be in the executable (even if it was technically implemented as a cabal library, that's not the point)

I am not sure I understand that. Listener is both a library and in the executable.

  • You can't test the full executable by referencing it as a library - and we certainly are not doing so right now, because the observability tests are setting the config via Haskell, for example.
  • Once you run the executable, you can't use haskell code to inspect it anymore.

Your tests are not running the executable, they are using the still-not-split postgrest as a library.

I know that, but I don't understand why it would invalidate the tests. There are various kinds of tests necessary to ensure high quality: unit tests, white box tests, black box tests - all with different trade-offs.

You could use the same argument for postgrest-test-spec - they don't test the whole executable. It does not make them invalid or not useful.

And again - there are good arguments for writing tests in Haskell - that already was discussed in #4671.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this discussion is probably better off in #4868, because many of the points you are addressing here, I already did there.

@mkleczek mkleczek force-pushed the with-tools-toxiproxy branch 8 times, most recently from 6cf4996 to 0077aa2 Compare May 1, 2026 05:54
This commit introduces toxiproxy in withTools adding the following helpers:
* withToxiproxyServer
* withToxiproxyProxy
* withToxiproxyPgProxy
@mkleczek mkleczek force-pushed the with-tools-toxiproxy branch from 0077aa2 to 2783535 Compare May 3, 2026 12:11
@wolfgangwalther
Copy link
Copy Markdown
Member

I have not reached a conclusion about higher level design questions, yet

I have thought about this a bit more now, see #4868. I don't think withToxiproxy is the right thing to do, really. That's because whether or not toxiproxy is running or not is really a fundamental requirement for whatever test suite runs under it: The test suite doesn't work without it, but there is also no replacement that can be used instead of withToxiproxy. Note how the with-pg-xx helpers provide different versions of PG to then run the same thing against them. Or, previously, the withPg and withSlowPg or so (can't remember the exact names) - they did not provide a tool, but a condition.

That's what these helpers are for, to provide different conditions. Not to provide an environment.

Toxiproxy is different, because we need to provide an environment with toxiproxy available - and that always needs to be the same if the tests make use of it. So we need to solve this on the "environment" level - and first discuss how we'd like to set up our environment in general. I mentioned virtual machines / NixOS VM tests before, you mentioned containers. I wrote some things up in #4868, but they don't really touch on the details, yet. Some rough brainstorming for which kinds of things we'll want to do with the environment:

  • Run PostgreSQL, obviously. We also need this for all the single-executable or even library tests.
  • Run PostgreSQL in a replicated setup.
  • Run nginx or other reverse proxies, to possibly test interactions.
  • Run toxiproxy, at least between pgrst and pg, but possibly between nginx and pgrst or the client and pgrst/nginx as well.
  • Do stuff with TCP explicitly, while running the things on the host system with unix sockets (as discussed elsewhere).
  • ... and probably a lot more

Still have to think more about this, but at this stage I'm confident we'll not want withToxiproxy.

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented May 3, 2026

Still have to think more about this, but at this stage I'm confident we'll not want withToxiproxy.

Sidestepping a little the discussion about any future directions, IMHO you are going into the territory of "make better the enemy of the good". The point is that we don't have any infrastructure to test recovery. And just writing a simple "showcase" test allowed to spot #4681 .

I am not sure if what you oppose is withToxiProxy toolset or use of Toxiproxy itself. If it is the first I am happy to implement something different - the question is: what? If it is the second then what alternatives do we have? So far we have some vague ideas about VM based testing environment but I don't know how fast (and if at all) we can come up with such an infrastructure. I am also pretty sure once we have it - we can easily migrate to it.

I also quite don't understand what you are saying about postgrest-with-* scripts not providing environment. For example postgrest-with-pg starts the database server, posgtrest-with-postgrest starts PostgREST itself. How is it different from postgrest-with-toxiproxy starting Toxiproxy?

@wolfgangwalther
Copy link
Copy Markdown
Member

wolfgangwalther commented May 3, 2026

I am not sure if what you oppose is withToxiProxy toolset or use of Toxiproxy itself.

The first.

If it is the first I am happy to implement something different - the question is: what?

I am still working on that part.

How is it different from postgrest-with-toxiproxy starting Toxiproxy?

You can do postgrest-with-pg-18 postgrest-test-spec, postgrest-with-pg-17 postgrest-test-spec, and so on.

What alternatives do you have to postgrest-with-toxiproxy postgrest-test-which-needs-toxiproxy? You don't have any.

If we were using such a with-xxx helper, then it would roughly need to satisfy the following conditions to be useful:

  • It would need to be transparent to the test script on whether we're running with or without toxiproxy,
  • We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss, ... things like these. Totally made up, but hopefully they make it a tad clearer what I mean by "condition" instead of "tool".

Of course these scripts provide some kind of environment, they need to by the nature of how they work. But they are really bad at providing an isolated environment, which is really what we need for any of these.

@wolfgangwalther
Copy link
Copy Markdown
Member

(there is one other option of when the with helpers are useful, which I did not consider before: If we're reusing that part of code for multiple test-suites - however, I don't see us using toxiproxy across multiple test-suites)

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented May 3, 2026

If we were using such a with-xxx helper, then it would roughly need to satisfy the following conditions to be useful:

  • It would need to be transparent to the test script on whether we're running with or without toxiproxy,

But it is!

postgrest-with-pg postgrest-with-toxiproxy /bin/sh -c "$PGPORT=$TOXI_PGPORT <run-your-test-suite>"

The discussion whether withToxiProxyPg should set PGPORT is a separate discussion but once it does - it is absolutely orthogonal.

  • We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss, ... things like these. Totally made up, but hopefully they make it a tad clearer what I mean by "condition" instead of "tool".

I agree here and that was my idea - not sure if they should be added now though.

OTOH you also need to have the possibility to manipulate Toxiproxy inside the tests to make changing network conditions part of the test scenario - for hspec I raised #4836

@wolfgangwalther
Copy link
Copy Markdown
Member

OTOH you also need to have the possibility to manipulate Toxiproxy inside the tests to make changing network conditions part of the test scenario

Yeah, I had assumed that to be the basic pre-requisite to make toxiproxy useful. That's what I mean by not being transparent to the test. Once you make use of toxiproxy in a test-suite, it becomes a requirement.

I agree here and that was my idea - not sure if they should be added now though.

Then maybe I don't understand how these are supposed to be used, yet. My understanding was that these conditions are to be managed by each test case, not by the environment for the whole test suite. Isn't that the great advantage of toxiproxy over something like slocat?

Once they are managed by each test case we will very quickly come up with the requirement to isolate proxies for each test case, so that they can still run in parallel, without affecting each other. At that point, a postgrest-with-toxiproxy-xxx will not even work anymore. So we will end up spinning up proxies and stuff from the test framework, I believe.

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented May 3, 2026

OTOH you also need to have the possibility to manipulate Toxiproxy inside the tests to make changing network conditions part of the test scenario

Yeah, I had assumed that to be the basic pre-requisite to make toxiproxy useful. That's what I mean by not being transparent to the test. Once you make use of toxiproxy in a test-suite, it becomes a requirement.

I agree here and that was my idea - not sure if they should be added now though.

Then maybe I don't understand how these are supposed to be used, yet. My understanding was that these conditions are to be managed by each test case, not by the environment for the whole test suite. Isn't that the great advantage of toxiproxy over something like slocat?

So really - there is a couple important but separate things to consider:

  • Prerequisite: start Postgres on TCP socket
  • Provide Toxiproxy the tool in our Nix environment
  • Provide a way to start Toxiproxy
  • Provide means for tests to manipulate proxies
  • Provide means to transparently inject "slow" toxic (in place of slocat) to load tests - important once we move to pipelining

To be honest - this PR was started when slocat was still present and was supposed to replace it and change as little as possible.

I get where you are coming from and I am happy to restrict the scope of this PR to starting Pg on TCP socket and providing Toxiproxy the tool. Or the whole thing should be moved to #4836.

@wolfgangwalther
Copy link
Copy Markdown
Member

Provide means to transparently inject "slow" toxic (in place of slocat) to load tests - important once we move to pipelining

I disagree on that part. The load tests are not supposed to be used for these kind of tests. These kind of things don't need to be tested on every commit. The load tests are meant to test haskell code, not general architecture of how PostgREST runs within certain environment. Looking at the table in #4868, we will need toxiproxy for the "Environment / Correctness" part and maybe for the "Environment / Performance" part - and setting it up in postgrest-benchmark is entirely separate.

Yes, we would certainly want to test the effects of pipeling on performance. But not with our loadtests.

Provide means for tests to manipulate proxies

This is the most critical piece for me right now. As discussed earlier, I'm not convinced that we'll want to hspec based tests for the new test-suite. This is the library/executable discussion. I actually think from the perspective of the test runner, nothing is wrong with our pytest based setup. The thing we need to discuss is: How are we going to inspect the running PostgREST executable without looking at unstructured logs? This is independent from which test-runner is used. It could be hspec, it could be pytest - doesn't matter. But we shouldn't be using hspec just to be able to compile against the postgrest library.

Once we solve this piece and know where we want to go, we can start providing the right tools for that test framework to work well with toxiproxy.

Of course this interacts with how we provide what I call the "environment", i.e. various tools all running at the same time and us being able to manage them well. Since I'm not so convinced about using containers, I am thinking primarily about how to set these environments up in a NixOS VM test - and then do one of:

  • Use the NixOS test driver to run these kinds of tests, or
  • Just run whatever test framework we're running with (could be pytest, could be hspec) inside the VM on one of the machines

That's the two things I am working on right now.

@steve-chavez
Copy link
Copy Markdown
Member

We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss

FWIW, on postgrest-benchmark I manage variants with env vars (ref). My idea when proposing the new postgrest-test-recovery test suite before, was that we would add io tests (pytest) by customizing the proxy failure states with the env vars, instead of new postgrest-with-* scripts. Maybe something similar can be done with toxiproxy?

@mkleczek
Copy link
Copy Markdown
Collaborator Author

mkleczek commented May 4, 2026

We'd need multiple variants of that script, for example postgrest-with-toxiproxy-very-slow, postgrest-with-toxiproxy-just-a-little-slow, postgrest-with-toxiproxy-connection-loss

FWIW, on postgrest-benchmark I manage variants with env vars (ref). My idea when proposing the new postgrest-test-recovery test suite before, was that we would add io tests (pytest) by customizing the proxy failure states with the env vars, instead of new postgrest-with-* scripts. Maybe something similar can be done with toxiproxy?

I am not sure what your idea is but Toxiproxy is a server that you can manipulate in runtime with REST API. So you can either use your language's REST client to inject "toxics" or you can use toxiproxy-cli command line tool to do that (it is a thin wrapper around the REST API).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants