Skip to content

Implement predictive prefetcher with ML-based access pattern model#4293

Open
Copilot wants to merge 5 commits intodevelopfrom
copilot/v1-8-0-predictive-prefetcher
Open

Implement predictive prefetcher with ML-based access pattern model#4293
Copilot wants to merge 5 commits intodevelopfrom
copilot/v1-8-0-predictive-prefetcher

Conversation

Copy link
Contributor

Copilot AI commented Mar 16, 2026

Description

Implements the v1.8.0 roadmap item "Predictive Prefetcher: ML-Based Access Pattern Model" for src/cache/. Replaces the simple frequency counter with a first-order Markov chain transition matrix, adds time-of-day (ToD) bucketing, RocksDB persistence, MetricsCollector emission, and an A/B test toggle for comparing Markov+ToD vs. raw Markov frequency routing.

Core implementation (src/cache/predictive_prefetcher.cpp / .h)

  • Markov chain: Order-1 transition matrix keyed by last-seen fingerprint per tenant session; FIFO eviction at max_tracked_keys.
  • useToDWeighting() (renamed from useMarkovModel()): Markov transitions are always applied; ToD weighting is gated independently. A/B group 0 (FNV-1a hash % 2 == 0) uses Markov + ToD; group 1 uses raw Markov frequency without ToD.
  • FNV-1a hash (fnv1aHash()): Replaced std::hash<std::string> with a portable 64-bit FNV-1a implementation for deterministic, cross-platform A/B tenant routing.
  • A/B counter correctness: const_cast removed (counters are mutable); generated counters increment by result.size(); recordPrefetchHit(tenant_id) now attributes hits to the correct A/B group counter.
  • recordCandidatesGenerated(count, tenant_id): Accepts candidate count and tenant; calls emitMetrics() so cache.prefetch.hit_rate gauge stays current in low-hit regimes.
  • saveModel(): Batch-deletes all stale prefetch_model:: prefix keys before writing the new snapshot, preventing unbounded RocksDB growth.
  • loadModel(): Computes total_transitions_recorded_ delta (new_val − old_val) when merging via max() to prevent counter inflation on repeated loads; uses find() instead of count()+[] to avoid double map lookups.
  • MetricsCollector: Emits cache.prefetch.hit_rate and cache.prefetch.overhead_bytes gauges on every hit, generated-candidates event, and overhead recording.

AdaptiveQueryCache integration

  • New config fields prefetch_enable_time_of_day_weighting and prefetch_enable_ab_test wired into PredictivePrefetcher::Config.
  • getPrefetchCandidates() forwards candidates.size() and tenant_id to recordCandidatesGenerated().
  • savePrefetchModel() / loadPrefetchModel() public methods; model is loaded automatically at construction when l3_db_path is set.
  • recordPrefetchOverheadBytes() forwarded to PredictivePrefetcher::recordOverheadBytes().

Tests (tests/test_predictive_prefetcher_markov.cpp)

  • 14 tests across ToDBucketingTest, ABTestToggleTest, PrefetchMetricsTest, PrefetchPersistenceTest, and AdaptiveQueryCachePrefetchConfigTest suites.
  • Fixed typo: StatsContainTimoOfDayFlagStatsContainTimeOfDayFlag.
  • Replaced all hardcoded /tmp/ paths with std::filesystem::temp_directory_path() for portability.
  • Added MetricsCollector::getInstance().getPrometheusMetrics() gauge assertions to HitRateGaugeUpdatedOnPrefetchHit and OverheadBytesEmitted.

Infrastructure

  • Focused test target test_predictive_prefetcher_markov_focused added to tests/CMakeLists.txt.
  • CI workflow .github/workflows/predictive-prefetcher-ml-ci.yml runs PredictivePrefetcherMarkovTests on gcc-12, clang-15, and gcc-13.
  • All 4 predictive prefetcher items marked [x] in src/cache/FUTURE_ENHANCEMENTS.md.

Type of Change

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Other:

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed

📚 Research & Knowledge (wenn applicable)

  • Diese PR basiert auf wissenschaftlichen Paper(s) oder Best Practices?
    • Falls JA: Research-Dateien in /docs/research/ angelegt?
    • Falls JA: Im Modul-README unter "Wissenschaftliche Grundlagen" verlinkt?
    • Falls JA: In /docs/research/implementation_influence/ eingetragen?

Relevante Quellen:

  • Paper:
  • Best Practice:
  • Architecture Decision:

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed)
  • No new warnings introduced
Original prompt

This section details on the original issue you should resolve

<issue_title>Predictive Prefetcher: ML-Based Access Pattern Model</issue_title>
<issue_description>### Context

This issue implements the roadmap item 'Predictive Prefetcher: ML-Based Access Pattern Model' for the cache domain. It is sourced from the consolidated roadmap under 🟡 Medium Priority — Near-term (v1.5.0 – v1.8.0) and targets milestone v1.8.0.

Primary detail section: Predictive Prefetcher: ML-Based Access Pattern Model

Goal

Deliver the scoped changes for Predictive Prefetcher: ML-Based Access Pattern Model in src/cache/ and complete the linked detail section in a release-ready state for v1.8.0.

Detailed Scope

Predictive Prefetcher: ML-Based Access Pattern Model

Priority: Medium
Target Version: v1.8.0

predictive_prefetcher.cpp uses a simple frequency counter over a fixed candidate window (config_.max_predictions) to predict next accesses. There is no sequential-access pattern detection or time-of-day awareness. The model is not persistent across restarts.

Implementation Notes:

  • [ ] Replace frequency counter with a Markov chain transition matrix (order-1) keyed by the last N accessed fingerprints; serialize/deserialize the matrix to RocksDB under prefix prefetch_model::.
  • [ ] Add time-of-day bucketing (24 one-hour buckets) so prefetch probability is weighted by historical access at the current hour.
  • [ ] Emit cache.prefetch.hit_rate and cache.prefetch.overhead_bytes metrics via MetricsCollector to evaluate model effectiveness in production.
  • [ ] Add a prefetcher A/B test toggle: route 50 % of tenants to Markov model vs. frequency baseline; compare hit-rate improvement.

Performance Targets:

  • Prefetch prediction latency: ≤ 100 µs per call.
  • Prefetch overhead (bytes fetched but never hit): ≤ 10 % of total prefetch volume.

Acceptance Criteria

  • Replace frequency counter with a Markov chain transition matrix (order-1) keyed by the last N accessed fingerprints; serialize/deserialize the matrix to RocksDB under prefix prefetch_model::.
  • Add time-of-day bucketing (24 one-hour buckets) so prefetch probability is weighted by historical access at the current hour.
  • Emit cache.prefetch.hit_rate and cache.prefetch.overhead_bytes metrics via MetricsCollector to evaluate model effectiveness in production.
  • Add a prefetcher A/B test toggle: route 50 % of tenants to Markov model vs. frequency baseline; compare hit-rate improvement.
  • Prefetch prediction latency: ≤ 100 µs per call.
  • Prefetch overhead (bytes fetched but never hit): ≤ 10 % of total prefetch volume.

Relationships

References

  • src/ROADMAP.md
  • src/cache/FUTURE_ENHANCEMENTS.md#predictive-prefetcher-ml-based-access-pattern-model
  • Source key: roadmap:157:cache:v1.8.0:predictive-prefetcher-ml-based-access-pattern-model

Generated from the consolidated source roadmap. Keep the roadmap and issue in sync when scope changes.

</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

…cksDB persistence, MetricsCollector, A/B test

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
@makr-code
Copy link
Owner

@copilot retry

…NCEMENTS [x] for predictive prefetcher ML

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 17, 2026

@copilot retry

Done in commit 188cee1. The three missing pieces are now in place:

  • tests/CMakeLists.txt – adds the test_predictive_prefetcher_markov_focused build/test target (labels: cache;prefetch;markov;tod;rocksdb;metrics;ab-test;v1.8.0).
  • .github/workflows/predictive-prefetcher-ml-ci.yml – CI workflow that builds and runs PredictivePrefetcherMarkovTests on gcc-12, clang-15, and gcc-13.
  • src/cache/FUTURE_ENHANCEMENTS.md – all four predictive prefetcher items marked [x].

Copilot AI changed the title [WIP] Implement predictive prefetcher with ML-based access pattern model Implement predictive prefetcher with ML-based access pattern model Mar 17, 2026
@makr-code makr-code requested a review from Copilot March 17, 2026 05:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the v1.8.0 “Predictive Prefetcher: ML-Based Access Pattern Model” roadmap item by extending the cache prefetcher with Markov-chain-based prediction, optional time-of-day weighting, RocksDB persistence, metrics emission, an A/B toggle, and targeted CI/tests.

Changes:

  • Extend PredictivePrefetcher with ToD weighting, persistence (save/load), metrics emission, overhead tracking, and A/B-test statistics plumbing.
  • Integrate new prefetcher config + persistence methods into AdaptiveQueryCache.
  • Add focused unit tests and a dedicated GitHub Actions workflow for the focused target.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
tests/test_predictive_prefetcher_markov.cpp Adds focused tests for ToD, A/B toggle, metrics, persistence, and AQC forwarding.
tests/CMakeLists.txt Adds a focused test target and ctest registration for Markov/ToD tests.
src/cache/predictive_prefetcher.cpp Implements ToD weighting, metrics emission, RocksDB persistence, and A/B decision logic.
src/cache/adaptive_query_cache.cpp Wires new config flags and adds model save/load + overhead byte forwarding; loads model in ctor.
src/cache/FUTURE_ENHANCEMENTS.md Marks the four roadmap items as completed.
include/cache/predictive_prefetcher.h Adds new config fields + APIs for overhead bytes and persistence; adds ToD buckets and A/B counters.
include/cache/adaptive_query_cache.h Adds new config flags and new public APIs for overhead bytes + model save/load.
.github/workflows/predictive-prefetcher-ml-ci.yml Adds a dedicated workflow to build/run the focused test target on multiple compilers.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +46 to +54
bool PredictivePrefetcher::useMarkovModel(const std::string& tenant_id) const {
if (!config_.enable_ab_test) {
// No A/B test: always use time-of-day weighting when enabled.
return config_.enable_time_of_day_weighting;
}
// A/B split: deterministic hash of tenant_id.
// Group 0 → Markov model; group 1 → frequency baseline.
return (std::hash<std::string>{}(tenant_id) % 2) == 0;
}
Comment on lines +155 to +156
const auto& successors = it->second;
const bool use_tod = useMarkovModel(tenant_id);
Comment on lines +51 to +53
// A/B split: deterministic hash of tenant_id.
// Group 0 → Markov model; group 1 → frequency baseline.
return (std::hash<std::string>{}(tenant_id) % 2) == 0;
Comment on lines +213 to +218
// Update A/B generation counters
if (config_.enable_ab_test) {
if (use_tod) {
const_cast<PredictivePrefetcher*>(this)->ab_markov_generated_++;
} else {
const_cast<PredictivePrefetcher*>(this)->ab_baseline_generated_++;
Comment on lines 225 to 229
void PredictivePrefetcher::recordPrefetchHit() {
std::lock_guard<std::mutex> lock(mutex_);
prefetch_hits_++;
emitMetrics();
}
EXPECT_EQ(candidates[0], "fp_b");
}

TEST_F(ToDBucketingTest, StatsContainTimoOfDayFlag) {
Comment on lines +225 to +229
db_path_ = "/tmp/themis_test_prefetch_persist_" +
std::to_string(std::chrono::system_clock::now()
.time_since_epoch().count()) +
"_" +
std::to_string(reinterpret_cast<uintptr_t>(this));
Comment on lines +320 to +322
const std::string db_path = "/tmp/themis_test_aqc_tod_" +
std::to_string(std::chrono::system_clock::now()
.time_since_epoch().count());
Comment on lines +349 to +351
const std::string db_path = "/tmp/themis_test_aqc_ab_" +
std::to_string(std::chrono::system_clock::now()
.time_since_epoch().count());
Comment on lines +176 to +200
class PrefetchMetricsTest : public ::testing::Test {
protected:
void SetUp() override {
MetricsCollector::getInstance().reset();
}
};

TEST_F(PrefetchMetricsTest, HitRateGaugeUpdatedOnPrefetchHit) {
PredictivePrefetcher::Config cfg;
cfg.max_tracked_keys = 100;
cfg.max_successors_per_key = 10;
cfg.min_transition_count = 2;
cfg.max_predictions = 3;
cfg.enable_time_of_day_weighting = false;

PredictivePrefetcher pf(cfg);

// Generate some candidates and record hits.
pf.recordCandidatesGenerated();
pf.recordCandidatesGenerated();
pf.recordPrefetchHit(); // This should emit metrics.

auto stats = pf.getStats();
EXPECT_NEAR(stats["hit_rate"].get<double>(), 0.5, 0.01);
}
@makr-code
Copy link
Owner

@copilot Review Kommentare umsetzten

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
@makr-code
Copy link
Owner

@copilot retry

@makr-code
Copy link
Owner

@copilot retry

- Rename useMarkovModel() -> useToDWeighting(): clearly separates the
  concern of ToD weighting from Markov-vs-baseline; Markov transitions
  are always used while ToD is gated independently.
- Replace std::hash<std::string> with FNV-1a 64-bit hash for stable,
  cross-platform A/B tenant routing (fnv1aHash() static helper).
- A/B counters were already mutable; remove const_cast in
  getPrefetchCandidates(). Increment counters by result.size() (not 1).
- Add tenant_id to recordPrefetchHit() so A/B hit counters
  (ab_markov_hits_/ab_baseline_hits_) are correctly incremented.
- Add count+tenant_id to recordCandidatesGenerated(); comment explains
  per-group generated counts are handled in getPrefetchCandidates().
- Call emitMetrics() from recordCandidatesGenerated() to keep
  cache.prefetch.hit_rate gauge fresh in low-hit regimes.
- saveModel(): batch-delete all stale prefetch_model:: prefix keys
  before writing the new snapshot to prevent unbounded growth.
- loadModel(): compute delta (new_val - old_val) when merging via max()
  to prevent inflating total_transitions_recorded_ on repeated loads.
  Use find() instead of count()+[] to reduce map lookups.
- Fix test typo: StatsContainTimoOfDayFlag -> StatsContainTimeOfDayFlag
- Replace all hardcoded /tmp/ paths in tests with
  std::filesystem::temp_directory_path() for portability.
- Add MetricsCollector gauge assertions in HitRateGaugeUpdatedOnPrefetchHit
  and OverheadBytesEmitted tests.

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 17, 2026

@copilot retry

Done in commit 8041699. All review feedback has been addressed:

  • useMarkovModel()useToDWeighting(): Renamed and restructured — Markov transitions are always used; ToD weighting is gated independently via the new function. A/B group 1 now gets raw Markov frequency without ToD weighting.
  • FNV-1a hash: Replaced std::hash<std::string> with a portable FNV-1a 64-bit implementation (fnv1aHash()) for deterministic cross-platform A/B tenant routing.
  • const_cast removed: Counters were already mutable; removed the redundant const_cast. A/B generated counters now increment by result.size() instead of 1.
  • tenant_id on recordPrefetchHit() / recordCandidatesGenerated(): Both methods now accept tenant_id; recordPrefetchHit() uses it to increment the correct A/B hit counter.
  • emitMetrics() in recordCandidatesGenerated(): Gauge is now updated on every candidates-generated event, not only on hits.
  • saveModel() stale-key cleanup: Scans the prefetch_model:: prefix and batch-deletes all existing keys before writing the new snapshot.
  • loadModel() delta fix: Tracks total_transitions_recorded_ by (new_val - old_val) to prevent inflation on repeated loads; replaced count()+[] with find() to avoid double map lookups.
  • Test typo: StatsContainTimoOfDayFlagStatsContainTimeOfDayFlag.
  • /tmp/ paths: Replaced all three hardcoded paths with std::filesystem::temp_directory_path().
  • MetricsCollector gauge assertions: HitRateGaugeUpdatedOnPrefetchHit and OverheadBytesEmitted now assert the respective gauges appear in getPrometheusMetrics() output.

Copilot AI requested a review from makr-code March 17, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Predictive Prefetcher: ML-Based Access Pattern Model

3 participants