fix: Fix data-race in lru-cache and 200x improvement in EraseEntriesWithGivenPrefix performance for deleting 10K entries out of 20K#4515
Conversation
47d3336 to
43633e3
Compare
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical data race within the LRU cache implementation, specifically when iterating and modifying the cache's index concurrently. The changes introduce a more robust and performant approach to erasing entries, particularly for prefix-based deletions, by carefully managing lock acquisition and delegating core removal logic to an internal helper. This ensures the cache operates safely and efficiently under concurrent access patterns. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a data race in the EraseEntriesWithGivenPrefix function within the LRU cache. The fix refactors the deletion logic to first collect all keys for deletion under a read lock, and then perform a batch deletion under a single write lock. This not only fixes the concurrent map iteration and map write error but also improves performance by reducing lock contention. A new concurrent test has been added to validate the fix. The implementation is clean and correctly addresses the issue.
53a9ce6 to
4fb3e88
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #4515 +/- ##
==========================================
- Coverage 83.71% 83.71% -0.01%
==========================================
Files 163 163
Lines 20078 20090 +12
==========================================
+ Hits 16809 16818 +9
- Misses 2646 2648 +2
- Partials 623 624 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b780875 to
18c5dbc
Compare
- Fix a data race bug (fatal error: concurrent map iteration and map write) where `c.index` was previously iterated without holding a lock while other operations might insert concurrently. - Refactored `Erase` and `EraseEntriesWithGivenPrefix` to use a common internal lock-free `eraseInternal` utility function. - Ensured that `EraseEntriesWithGivenPrefix` will acquire the write lock once for the entire batch instead of taking it for every entry. - Add benchmark tests for lru-cache erasure for a given prefix that show 18x performance improvement for erasing entries in bulk. Before fix: goos: linux goarch: amd64 pkg: github.com/googlecloudplatform/gcsfuse/v3/internal/cache/lru cpu: Intel(R) Xeon(R) CPU @ 2.20GHz BenchmarkEraseEntriesWithGivenPrefix-16 1628 747389 ns/op BenchmarkEraseEntriesWithGivenPrefix_Concurrent-16 fatal error: concurrent map iteration and map write After fix: [----------] Finished with tests from CacheTest goos: linux goarch: amd64 pkg: github.com/googlecloudplatform/gcsfuse/v3/internal/cache/lru cpu: Intel(R) Xeon(R) CPU @ 2.20GHz BenchmarkEraseEntriesWithGivenPrefix-16 30418 39771 ns/op BenchmarkEraseEntriesWithGivenPrefix_Concurrent-16 242079 4817 ns/op
Description
c.indexwas previously iterated without holding a lock while other operations might insert concurrently. While it doesn't cause issues in the applications since the operations to this module are already serialized, it's unintuitive and a code-smell.EraseandEraseEntriesWithGivenPrefixto use a common internal lock-freeeraseInternalutility function.EraseEntriesWithGivenPrefixwill acquire the write lock once for the entire batch instead of taking it for every entry.Perf
Before fix
After fix
Link to the issue in case of a bug fix.
b/495217870
Testing details
Any backward incompatible change? If so, please explain.
No