Skip to content

Actively manage local archive-index cache#3249

Merged
syphar merged 1 commit intorust-lang:mainfrom
syphar:archive-cache-clean
Mar 15, 2026
Merged

Actively manage local archive-index cache#3249
syphar merged 1 commit intorust-lang:mainfrom
syphar:archive-cache-clean

Conversation

@syphar
Copy link
Member

@syphar syphar commented Mar 15, 2026

This PR tries to solve a couple of issues with the local index cache that I had on my mind generally, and will become a real issue when the (ideal) new infra setup.

The issue is:

  • right now, the local index files are never deleted. So they grow without bounds, until they cover the whole of docs.rs.
  • on top of that, when we rebuild creates, the old index versions with the old build-id will be left on disk.

And with fargate / ECS we have to think about:

  • local storage being limited (currently the cache dir after some time is ~500 GiB)
  • the cache perhaps being empty after deploys, or when fargate rotates machines.

there are also other options in hosting where the issue might not that extreme, but I would like to have control over it, and be able to safely just recycle machines.

So what I wanted is:

  • limit the storage of the local index so usage won't increase more than I want to
  • evict obsolete or unused entries to save some space.

The initial approaches I had were quite complex, using atime and other things, which became quite complex.

Then I had the idea about this approach:

  • the moka crate already provides all the necessary algorithms (even TinyLFU, which is even better)
  • their try_get_with_by_ref can serialize our index-download, which makes our previous DashMap<Mutex> obsolete.
  • through its entry weight concept, we can keep track of cache size
  • through their eviction listeners we can locally clean up the files when the entry is removed from the cache through TTL or size.
  • if we want to improve performance, we can separately keep connection pools cached. But in the current situation, most time is spent between the webserver S3, so the saved couple of ms don't matter (yet).

The only caveats to this approach I can see each time we restart the server:

  • we have to re-scan the existing files in the local cache directory.
  • the LFU timestamps will be reset

But IMO nether a big issue.

Some other notable addition are metrics: with these we'll be able to see

  • repairs / attempts
  • cache hit/miss
  • cache entry sizes / downloaded sizes.

The entry sizes might be important when we aim for smaller cache sizes (let's imagine 1 GiB). We might have index databases that are bigger (100s of MiB), and one added big database might invalidate many smaller databases. The possible solution to that is keeping a separate manager for these entries. But there, we would need to know how often that happens, and which sizes, which means we need these metrics.

I also added a benchmark to see if the performance compares, and it does:

Benchmark comparision with `main`
archive_index_cache/hot_local_index_single/exists_in_archive
                        time:   [131.20 µs 141.56 µs 152.41 µs]
                        change: [−2.9181% +5.3384% +14.494%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  17 (17.00%) high mild
  1 (1.00%) high severe
archive_index_cache/cold_index_single/exists_in_archive
                        time:   [503.41 µs 557.80 µs 617.13 µs]
                        change: [−22.330% −14.800% −6.2318%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
archive_index_cache/cold_index_concurrent_same_key_16/exists_in_archive
                        time:   [2.4274 ms 2.4631 ms 2.5047 ms]
                        change: [−2.3364% −0.2383% +1.8885%] (p = 0.83 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking archive_index_cache/purge_then_recover_single/exists_in_archive: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 60.
archive_index_cache/purge_then_recover_single/exists_in_archive
                        time:   [1.2041 ms 1.3116 ms 1.4280 ms]
                        change: [−2.9597% +5.9378% +15.615%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

@syphar syphar self-assigned this Mar 15, 2026
@syphar syphar requested a review from a team as a code owner March 15, 2026 13:11
@github-actions github-actions bot added the S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed label Mar 15, 2026
@syphar syphar merged commit da7690e into rust-lang:main Mar 15, 2026
15 checks passed
@github-actions github-actions bot added S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it and removed S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed labels Mar 15, 2026
@syphar syphar deleted the archive-cache-clean branch March 15, 2026 16:07
@syphar syphar removed the S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants