Skip to content

[Epic] Rename wikidata-json-filter → wikidata-cache #12

@jakebromberg

Description

@jakebromberg

Overview

Rename this repository from wikidata-json-filter to wikidata-cache to match the naming pattern established by musicbrainz-cache. The current name describes the implementation (it filters Wikidata JSON dumps); the new name describes the artifact (a WXYC-filtered Wikidata cache PostgreSQL database).

Why

  • Today the three caches are named inconsistently: musicbrainz-cache (artifact-named), discogs-etl + discogs-xml-converter (split repos), wikidata-json-filter (implementation-named).
  • The PG database produced by this tool is referred to as "wikidata-cache" everywhere — see wxyc_etl::schema::wikidata, semantic-index/wikidata_client.py, library-metadata-lookup/scripts/entity_resolution/wikidata.py. The repo name is the only place it's called "wikidata-json-filter".
  • Cleaning up before substrate publish (#rec5) and monorepo (#rec10) avoids dragging the old name into permanent locations.

Scope

In scope:

  • Rename the repo on GitHub (auto-redirects old URLs).
  • Update Cargo.toml package name + binary name (wikidata-cache).
  • Update README.md, CLAUDE.md, all internal references.
  • Update consumers: semantic-index (Cargo + Python references), library-metadata-lookup, discogs-etl, the org-level CLAUDE.md.
  • Update CI/CD: GitHub Actions workflows, Docker images, any deploy targets.
  • Update the wxyc-etl shared crate's schema constants if they reference the old name.

Out of scope:

  • CLI standardization (handled by #rec2 — depends on this rename landing).
  • Schema or behavioral changes to the tool itself.

Deliverables

  • Repo renamed on GitHub
  • All internal name references updated
  • All consumer repos updated (one PR each)
  • CI green across all updated repos
  • Old name redirects work (verify via gh repo view WXYC/wikidata-json-filter)

Acceptance criteria

  • cargo run --bin wikidata-cache works in the renamed repo.
  • git clone git@github.com:WXYC/wikidata-json-filter.git continues to clone (via redirect).
  • No broken references in any other repo's CLAUDE.md, code, CI, or deploy config.

Child issues

  • Repo rename (single op; not orchestrator-eligible — coordination-heavy)
  • Per-consumer reference update (each orchestrator-eligible)

Blocks

  • #rec2 (CLI standardization) so wikidata-cache migration targets the renamed repo

Tracking

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic-level tracking issuephase-aPhase A: Foundationspipeline-hardeningMusic Data Pipeline Hardening project (#19)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions