Skip to content

thomasdesr/external-mirror-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

external-mirror-cache

An HTTP caching proxy that stores upstream responses in S3 and serves cache hits as presigned URL redirects. It's for internal infrastructure that repeatedly fetches the same external files (package repositories, container images, release artifacts).

Problem

Services that download external resources (OS packages during instance boot, language dependencies in CI, binary artifacts in deploy pipelines) make redundant requests to the same URLs. This creates a few issues:

  • Reliability: a transient upstream outage (CDN blip, rate limit, DNS failure) breaks builds and deploys that would otherwise succeed with cached content.
  • Latency: every request pays full round-trip cost to the origin, even when the content hasn't changed.
  • Bandwidth: the same multi-megabyte files get pulled across the internet repeatedly.

Existing solutions either require client-side configuration changes (explicit proxy settings, alternate registry URLs) or run a full mirror that needs its own sync schedule and storage management.

Design

The proxy sits behind an internal load balancer and needs no client-side proxy configuration. Clients just point their base URL at it.

URL scheme: GET /<domain>/<path> maps to https://<domain>/<path>. A request to /example.com/releases/v1.0.tar.gz fetches https://example.com/releases/v1.0.tar.gz.

Request flow:

  1. Check S3 for a cached copy (HeadObject to read stored response headers).
  2. If cached, send a conditional request upstream with If-None-Match / If-Modified-Since.
  3. On 304 Not Modified, redirect the client to a presigned S3 URL for the cached object.
  4. On 200 OK, stream the response body to S3 via the transfer manager, then redirect to the presigned URL.
  5. On upstream failure, optionally serve stale cached content based on a configurable fallback policy (connection errors, 5xx responses, or any error).

Concurrent requests for the same URL are deduplicated via singleflight. Only one request hits upstream, and all waiters receive the same presigned URL.

A few other things to know:

  • Clients use normal HTTP GETs with a rewritten base URL. No client-side proxy config needed.
  • ETag and Last-Modified headers are stored as S3 object metadata and used for revalidation, so unchanged content is never re-downloaded or re-uploaded.
  • When upstream is unavailable, previously cached content can still be served. This is controlled per failure class (connection errors, 5xx, any error).
  • Clients follow a 303 to a presigned S3 URL, so the proxy never buffers cached responses through its own process.
  • The proxy follows upstream redirects before caching. The cache key is the original requested URL, not the final redirect destination.
  • Systemd socket activation and sd_notify are supported for zero-downtime deploys.

SSRF protection

Because the proxy accepts a domain in the request path and makes outbound requests to it, it is an SSRF vector by construction. An attacker who can reach the proxy could request /<internal-host>/secrets and have the proxy fetch it on their behalf.

The --egress-proxy flag addresses this by routing all upstream requests through an HTTP CONNECT proxy that enforces egress policy. Only upstream fetches go through it. AWS SDK traffic (S3, IMDS) uses the default transport and is unaffected.

mirror-cache --egress-proxy http://127.0.0.1:4750

Smokescreen works well here. Its default configuration denies connections to private and internal IP ranges while allowing public internet, which is exactly the policy this proxy needs. Any CONNECT proxy that blocks RFC 1918 and link-local addresses will do. The proxy must never make direct outbound connections without egress filtering.

About

A automatic S3 mirror for fetching artifacts, mostly for bazel

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages