This repository demonstrates an end-to-end ETL pipeline using Docker, PostgreSQL, and Apache Airflow to extract data from NASA's Astronomy Picture of the Day (APOD) API, transform it, and load it into a PostgreSQL database.
.
├── docker-compose.yml # Sets up PostgreSQL, Airflow, and supporting services
├── dags/
│ └── nasa_apod_postgres.py # DAG definition for ETL process
└── README.md
- Extract: Fetches data from the NASA APOD API (
https://api.nasa.gov/planetary/apod) - Transform: Formats and structures the API response
- Load: Inserts the data into a PostgreSQL table inside a Docker container
- Schedule: Runs daily using Apache Airflow
Make sure you have Docker and Docker Compose installed.
docker-compose up -dThis spins up:
- PostgreSQL (
localhost:5432) - Airflow Webserver (
localhost:8080) - Airflow Scheduler, DAG Processor, Triggerer, and API
The DAG (nasa_apod_postgres.py) does the following:
- Creates the
apod_datatable if it doesn't exist - Sends a GET request to the NASA APOD API
- Parses the JSON response
- Inserts the data into PostgreSQL using Airflow’s
PostgresHook
docker exec -it <postgres_container_name> psql -U postgres -d postgresExample query:
SELECT * FROM apod_data;- Host:
127.0.0.1 - Port:
5432 - User:
postgres - Password:
postgres - Database:
postgres
Visit: http://localhost:8080
Default credentials (if set):
- User:
airflow - Password:
airflow
Trigger the nasa_apod_postgres DAG manually or wait for the next scheduled run.
- Make sure to set up a connection in Airflow named
postgres_defaultwith correct Postgres credentials. - Replace the
DEMO_KEYin the API call with your actual NASA API key. You can store it in:- Airflow Variables, or
.envfile if you configure environment loading
The apod_data table will look like this:
| id | title | explanation | url | date | media_type |
|---|
Check running containers:
docker psShut down services:
docker-compose down- If DBeaver shows
Connection Refused:- Ensure Docker is running
- Confirm that Postgres container is active (
docker ps)
- If
psqlcommand not found in WSL:sudo apt install postgresql-client
Arun Shukla
Passionate about automation, cloud, and data engineering 🚀
GitHub: @anshu1016


