Skip to content

Cache retrieved docs across AI sessions by Python version #8

@ayhammouda

Description

@ayhammouda

Problem

The server currently has a local SQLite documentation index and small process-lifetime LRU caches for hot lookups. That helps within one running server process, but retrieved documentation is not persisted as a reuse layer across AI/client sessions.

For MCP clients, repeated AI sessions often ask for the same stdlib pages or sections, especially for common symbols. If a doc has already been searched or retrieved once, the server should be able to reuse that prior retrieval efficiently in a later session instead of treating each session as cold.

Proposal

Add a persistent retrieval cache keyed by Python documentation version and request identity.

Suggested cache keys:

  • version
  • normalized slug
  • optional anchor
  • retrieval mode, if needed (page, section, symbol, etc.)
  • possibly the effective result budget parameters when they affect stored output

The cache should be version-aware so content retrieved for Python 3.12 is never reused for Python 3.13/3.14 unless explicitly safe.

Expected behavior

  • When search_docs or get_docs causes a documentation page/section/symbol to be resolved, store the resolved/retrieved doc payload for later reuse.
  • A later AI session using the same local docs index can retrieve the cached version-specific result without recomputing the full path.
  • Cache entries remain local and read-only from the MCP client's point of view.
  • Rebuilding or replacing the docs index should not serve stale cached content from an older index.

Design considerations

  • Tie cache validity to the current index.db identity, such as indexed version metadata, build timestamp, schema version, or a content/index hash.
  • Keep the cache separate from the canonical docs index unless there is a clear reason to store it in the same SQLite database.
  • Avoid caching cross-version search result lists unless the cache key captures the exact version filter and query normalization.
  • Prefer deterministic invalidation over TTL-only behavior.
  • Preserve the current simple operational story: users should not need to manage cache internals manually.

Acceptance criteria

  • Retrieved docs are persisted across server/client restarts.
  • Cache entries are keyed by Python docs version.
  • Cache misses fall back to the existing retrieval path.
  • Cache invalidates or is ignored after the local docs index is rebuilt/replaced.
  • Tests cover version isolation, restart persistence, cache hit/miss behavior, and stale-index invalidation.
  • Documentation explains what is cached, where it is stored, and how it is invalidated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions