Skip to content

Latest commit

 

History

History
47 lines (34 loc) · 980 Bytes

File metadata and controls

47 lines (34 loc) · 980 Bytes

Data pipeline

This directory contains the ETL process that builds ReverseGeocoder datasets from GeoNames and GeoLite2 source files.

For the detailed step-by-step flow and transformation notes, see PIPELINE.md.

Prerequisites

Required tools:

  • bash
  • curl
  • unzip
  • cut
  • sed
  • sqlite3
  • gawk
  • python3

Quick start

  1. Copy .env.dist to .env.
  2. Set GEOLITE2_ACCOUNT_ID and GEOLITE2_LICENSE_KEY in .env.
  3. Review ./generate.
  4. Run ./generate.
  5. Import generated CSV files with:
    • sqlite3 data.sqlite < import-sqlite.sql, or
    • your own DB pipeline using import-postgres.sql as reference.

Expected outputs include:

  • admin.csv
  • place.csv
  • ip.csv
  • ipv6.csv

Refreshing source data

Run ./cleanup to remove downloaded and intermediate files, then run ./generate again.

Related docs