Skip to content

Action should have automatic cache busting mechanism, or more docs about cache busting #32

@sureshjoshi

Description

@sureshjoshi

In the example projects, we have this handy piece of info:

# Note that named_caches and lmdb_store falls back to partial restore keys which
# may give a useful partial result that will save time over completely clean state,
# but will cause the cache entry to grow without bound over time.
# See https://www.pantsbuild.org/2.21/docs/using-pants/using-pants-in-ci for tips on how to periodically clean it up.
# Alternatively you change gha-cache-key to ignore old caches.

And then we have the suggestion to use this action, and instructions about manual usage and a cache nuke function: https://www.pantsbuild.org/2.21/docs/using-pants/using-pants-in-ci#directories-to-cache

Problem is, as the partial restore key is so lenient - and the cache key is strict enough, that using the nuke function from the docs won't work most of the time.

To reduce the monotonically increasing cache usage, a user will need to explicitly and manually change the cache key, or run a nuke function in the actions that will ALSO have an impact on cache saving (e.g. lockfiles change, pants.toml change, etc).


I used https://github.com/sureshjoshi/pants-plugins as a cache testing example:

cache-not-busting

With the second-last entry, in spite of removing almost all dependencies in that commit, we’re still pulling 220MB of cache - and that never gets cleared out. We have to explicitly bust the cache with a new cache key, and run everything from scratch to get the benefit.


Here is another example where I nuke the cache, but since the cache key doesn't change - this gives the "Cache hit occurred ... not saving cache"

image


I had the idea to try to use the gh cli to prematurely delete/expire caches, but since this would happen after the cache is downloaded - it would require special treatment.

I think the most reasonable, practical answer is to add some more documentation to this Action (and probably pantsbuild.org), as well as having some sort of automatic nuke-check on cache saving.

This might require using the restore/save cache actions, if there is no hook on cache itself to know if the saving cache key will be invalidated easily.

Essentially:

  • Run the action as normal
  • During post-action hooks, ask if it's a new cache key? (e.g. was pants.toml or named-caches-hash modified)
    • If not, do nothing
    • If so, run nuke_if_too_big $named_cache_dir $named_cache_limit_mb

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions