Compress /solve requests#4212
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces gzip compression for /solve requests to reduce network latency, controlled by a new --compress-solve-request flag, between the Autopilot and Driver components. A critical compilation error due to incorrect value moving in the compression logic and a high-severity issue where a metric records the compressed size instead of the uncompressed size were found.
MartinquaXD
left a comment
There was a problem hiding this comment.
Change looks good overall.
| // axum's default body limit is 2MB too low for solvers, 20MB is still too low | ||
| // so instead of constantly guessing and updating, we disable the limit altogether | ||
| .layer(axum::extract::DefaultBodyLimit::disable()) | ||
| .layer(tower_http::decompression::RequestDecompressionLayer::new()) |
There was a problem hiding this comment.
Since you had to patch the reference driver to make this work it clearly is a breaking change. But it seems like it makes sense to require every solver to support compressed requests and moving the compression flag to each solver instead of a single flag for the entire autopilot probably doesn't make sense.
I guess the gist of what I'm saying is that we need to tell solvers about this and make sure they support this before we flip the switch.
There was a problem hiding this comment.
I'm saying is that we need to tell solvers about this and make sure they support this before we flip the switch.
Yes, that is for sure.
| .unwrap(); | ||
| } | ||
|
|
||
| async fn limit_order_with_compressed_solve_request_test(web3: Web3) { |
There was a problem hiding this comment.
I don't see a check actually verifying that the request was sent in a compressed way. I guess technically this test would have to have a fake driver that inspects the /solve request for the header and tries to decompress it.
As the test is right now I think we can drop it and replace it by enabling --compress-solve by default.
There was a problem hiding this comment.
True. I updated the ReverseProxy implementation and improved one of the tests. 0dc003a
| let mut encoder = GzEncoder::new(Vec::new(), Compression::new(3)); | ||
| match encoder |
There was a problem hiding this comment.
Not suggesting to change anything in this PR just sharing some findings following @jmg-duarte's DMs about brotli compression.
on level 1 brotli is twice as fast as gzip level 3 while still beating gzip level 9 in file size. The downside is that this apparently only works over https and some solvers still use http IIRC.
brotli
Level 0: 2009237 bytes (30ms)
Level 1: 1663328 bytes (25ms)
Level 2: 1687065 bytes (44ms)
Level 3: 1647320 bytes (50ms)
Level 4: 1510469 bytes (68ms)
Level 5: 1599833 bytes (106ms)
Level 6: 1596011 bytes (129ms)
Level 7: 1592758 bytes (288ms)
Level 8: 1590406 bytes (401ms)
Level 9: 1588621 bytes (435ms)
Level 10: 1360942 bytes (3992ms)
Level 11: 1338503 bytes (11116ms)
gzip
Level 1: 2214921 bytes (67ms)
Level 2: 2179485 bytes (52ms)
Level 3: 2149025 bytes (54ms)
Level 4: 1981237 bytes (96ms)
Level 5: 1927338 bytes (118ms)
Level 6: 1883464 bytes (131ms)
Level 7: 1864462 bytes (135ms)
Level 8: 1844328 bytes (192ms)
Level 9: 1840352 bytes (255ms)
There was a problem hiding this comment.
It seems like Brotli supports HTTP. Only web browsers don't support the br compression over HTTP. I can give it a try. Thanks for providing the test result!
There was a problem hiding this comment.
Well, after running some unit tests and adjusting the e2e test, it seems like brotli works better for sure. Switched the implementation. Thanks @jmg-duarte for the idea!
This reverts commit d33863d.
|
Ok, it seems like testing it in staging doesn't make any sense since we have very small auction jsons there. Probably, it would be better to first ask all the colocated solvers to support the |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
If you are primarily concerned with request size you can test on shadow. Either just temporarily to not cause "down times" for solvers not supporting If you test on shadow mainnet, please check if the times reported by the autopilot log |
Description
Our current
/solverequest can be heavy on certain chains, and given the fact that it is expected to eventually switch to a cross-chain auction, the data will be growing significantly. While it is expected to implement a delta-sync approach with event-based auction updates, this is quite involved, and as a first step, we can look into compressing the request to reduce its size and reduce network latency/delays.The tests showed that
brotliwith compression level 1 outperformsgzip#4212 (comment)Changes
##Autopilot
--compress-solve-requestboolean CLI flag threaded throughrun_loop::Configsolve::Requestgains acompressed()method that br-compresses the body on a blocking task (CPU-intensive, like the existing serialization)auction_overheadmetricContent-Encoding: brheader only when the body is actually compressedrunloop_solve_request_body_sizehistogram metric that records the size (in bytes) of the JSON body sent in each/solverequest to drivers.Driver
RequestDecompressionLayerto the axum middleware stack, which automatically decompresses br-encoded request bodies based on theContent-EncodingheaderHow to test
New unit and e2e tests. Deploy manually with disabled compression to collect some metrics and then with compression enabled.
Related issues
Fixes #4206