This project is an ELT (Extract, Load, Transform) pipeline that extracts data from a source Postgres database, loads the data to a different Postgres database, and then performs transformations on the data. The data model is represented by the following ER diagram:
The only tranformation is to create a new film_ratings table with the movie information and assign each one a rating category (Excellent, Good, Average, Bad, and Trash) in a new column called rating_category based on its user_rating column.
- It is recommended to create a virtual environment for the project. In the root directory enter the following command:
python -m venv <virtual environment name>-
Activate the virtual environment.
-
Install the required dependencies:
pip install -r requirements.txt- Create a
.envfile in the root directory with the following environment varaibles:
DBT_TRANSFORMATIONS_FOLDER_PATH=
DBT_USER_FOLDER_PATH=
The DBT_TRANSFORMATIONS_FOLDER_PATH is the absolute path of the dbt transformations folder, dbt_transformations, in the project directory.
The DBT_USER_FOLDER_PATH is the absolute path of the .dbt file in the user folder of the operating system.
These are needed by the dbt container volumes as the source.
- Run Docker Compose to create the container of the
init_airflowfirst:
docker compose up init_airflow -d- Run the rest of the containers:
docker compose up- After all the containers are running, check the
destination_postgrescontainer to see the internal database:
docker exec -it <container name of the destination postgres> psql -U postgresOnce inside the container bash and with the psql prompt, enter:
\l\c destination_db\dtYou can see there are not any tables in the database.
-
Go to the Airflow webserver GUI (
localhost:8080) and trigger the DAG corresponding to the elt process calledelt_and_dbt. -
Check the
destination_postgrescontainer to see if the transformations were applied:
docker exec -it <container name of the destination postgres> psql -U postgresThere should be new tables and one of them should be called film_ratings and it should look like this:

