Skip to content

Lock/pin upstream versions for deterministic syncing #103

@Kilo59

Description

@Kilo59

Summary

Pin upstream configs to a specific commit SHA for deterministic, reproducible syncing. Track the resolved version in a lockfile so check verifies against a known state, not just "latest".

Parent: #100 (Tier 1)

Motivation

Without pinning, ruff-sync pull in CI could silently pick up breaking upstream changes. Every major package ecosystem has a lockfile mechanism:

Ecosystem Lock Mechanism
npm package-lock.json records exact resolved versions
Go go.sum records content hashes
pre-commit rev: <sha/tag> pins exact hook versions
pip pip freeze / uv lock

ruff-sync should let users opt into deterministic syncing while keeping the default behavior unchanged.

Proposed Design

Lock storage

Store lock metadata in pyproject.toml under [tool.ruff-sync.lock] rather than a separate file. This keeps everything in one place and avoids adding a new file to every project.

# Written by ruff-sync after a successful pull
[tool.ruff-sync.lock]
upstream = "https://raw.githubusercontent.com/my-org/standards/main/pyproject.toml"
commit = "abc1234def5678..."           # resolved commit SHA (if git-resolvable)
content-hash = "sha256:e3b0c44..."     # hash of the upstream ruff config section
pulled-at = "2026-03-15T15:30:00Z"     # timestamp of last pull

New CLI commands / flags

# Pull and update lock (default behavior when lock section exists)
ruff-sync pull                      # fetches latest, updates lock

# Pull but skip lock update (useful for testing)
ruff-sync pull --no-lock

# Check against locked version specifically
ruff-sync check                     # uses lock hash if available

# Explicitly update lock without applying changes
ruff-sync lock                      # new subcommand: fetch, resolve, write lock only

Implementation Plan

1. Define lock schema in Config TypedDict (core.py)

class LockInfo(TypedDict, total=False):
    """Lock metadata written after a successful pull."""
    upstream: str          # resolved raw URL
    commit: str            # git commit SHA if resolvable
    content_hash: str      # sha256 of the upstream ruff config text
    pulled_at: str         # ISO 8601 timestamp

2. Compute content hash (core.py)

import hashlib

def compute_config_hash(config_text: str) -> str:
    """Compute a deterministic hash of the upstream ruff config."""
    # Normalize: parse and re-serialize to ignore whitespace variance
    normalized = tomlkit.dumps(tomlkit.parse(config_text))
    return f"sha256:{hashlib.sha256(normalized.encode()).hexdigest()}"

3. Resolve commit SHA (core.py)

For GitHub/GitLab URLs, resolve the current commit SHA via API or from the git clone:

async def resolve_commit_sha(
    url: URL, branch: str, client: httpx.AsyncClient
) -> str | None:
    """Resolve the current commit SHA for a GitHub/GitLab branch."""
    # GitHub API: GET /repos/{owner}/{repo}/commits/{branch}
    if url.host in _GITHUB_HOSTS or url.host == _GITHUB_RAW_HOST:
        # Extract org/repo from URL
        ...
        api_url = f"https://api.github.com/repos/{org}/{repo}/commits/{branch}"
        resp = await client.get(
            api_url, headers={"Accept": "application/vnd.github.sha"}
        )
        if resp.status_code == 200:
            return resp.text.strip()
    return None  # fallback: no SHA available

For git clone fetches, extract SHA from the cloned repo.

4. Write lock after pull() (core.py)

After a successful merge, write lock metadata to pyproject.toml:

async def pull(args: Arguments) -> int:
    # ... existing fetch + merge logic ...

    # Write lock metadata (only for pyproject.toml targets)
    if not args.no_lock and _source_toml_path.name == "pyproject.toml":
        lock_info = {
            "upstream": str(fetch_result.resolved_upstream),
            "content-hash": compute_config_hash(upstream_config_text),
            "pulled-at": dt.datetime.now(dt.timezone.utc).isoformat(),
        }
        if commit_sha:
            lock_info["commit"] = commit_sha
        # Write into [tool.ruff-sync.lock] using tomlkit
        _write_lock(merged_toml, lock_info)
    source_toml_file.write(merged_toml)

5. Check against lock in check() (core.py)

If lock metadata exists, compare the upstream content hash against the locked hash for a fast "has upstream changed?" check:

async def check(args: Arguments) -> int:
    # ... existing logic ...

    # If lock exists, also verify upstream hasn't changed since last pull
    config = get_config(args.to)
    if "lock" in config:
        lock = config["lock"]
        upstream_hash = compute_config_hash(upstream_config_text)
        if upstream_hash != lock.get("content-hash"):
            print("Warning: Upstream has changed since last pull "
                  f"(locked at {lock.get('pulled-at', 'unknown')})")

6. Add lock subcommand (cli.py)

A lightweight subcommand that fetches upstream, resolves the commit SHA and content hash, and writes the lock section without modifying the ruff config:

lock_parser = subparsers.add_parser(
    "lock",
    parents=[common_parser],
    help="Fetch upstream and update lock metadata without changing ruff config",
)

7. Add --no-lock flag to pull (cli.py)

pull_parser.add_argument(
    "--no-lock",
    action="store_true",
    help="Skip updating the lock metadata after pull.",
)

Backward Compatibility

  • Lock is opt-in: if no [tool.ruff-sync.lock] section exists, behavior is unchanged.
  • The lock section is written automatically on the first pull (can be disabled with --no-lock).
  • check gracefully handles missing lock sections.
  • Lock metadata is stored using tomlkit to preserve formatting of the rest of pyproject.toml.

Edge Cases

  • ruff.toml targets: Lock metadata can't go in ruff.toml (no [tool] section). Options: (a) skip locking, (b) store in a nearby pyproject.toml, or (c) use a standalone .ruff-sync.lock file. Recommend (a) for MVP.
  • Git clone upstreams: SHA is directly available from the clone; no API call needed.
  • Non-GitHub/GitLab hosts: Content hash still works; commit SHA may not be resolvable (logged as info, not an error).

Test Plan

  1. Unit test for compute_config_hash() — verify deterministic hashing.
  2. Unit test for _write_lock() — verify tomlkit preserves formatting when adding lock section.
  3. E2E testpull writes lock, subsequent check passes, modify upstream, check reports both hash mismatch and content drift.
  4. --no-lock test — verify lock section is not written.
  5. lock subcommand test — verify it writes lock without modifying ruff config.
  6. Missing lock test — verify check works normally without lock section.

Files Changed

File Change
src/ruff_sync/core.py LockInfo TypedDict, compute_config_hash(), resolve_commit_sha(), _write_lock(), update pull() and check()
src/ruff_sync/cli.py lock subcommand, --no-lock flag, Arguments.no_lock field
src/ruff_sync/__init__.py Export LockInfo
tests/test_basic.py Hash and lock-write unit tests
tests/test_e2e.py E2E lock lifecycle tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions