DataNest is a personal data platform designed to collect, transform and analyze financial and lifestyle data using a modern analytics engineering stack.
The project implements a data warehouse architecture, automated ETL pipelines, machine learning models for transaction classification, and interactive analytics dashboards.
Its goal is to demonstrate how a small-scale personal data ecosystem can replicate many principles used in modern data platforms.
DataNest centralizes personal data such as:
- financial transactions
- energy consumption
- budgets
- shopping data
- device usage
The platform processes raw exports (CSV/JSON) and transforms them into a structured star schema data warehouse, enabling advanced analytics and machine learning workflows.
Key capabilities include:
- Automated ingestion pipelines
- Data modeling with fact and dimension tables
- Transaction categorization using machine learning
- Interactive dashboards for personal analytics
- Energy consumption monitoring by device and location
The platform follows a simplified modern data stack architecture.
Data Sources - Bank CSV exports - Energy consumption logs - Device activity logs - Shopping data
↓
ETL Pipelines (Python) - Data cleaning - Data normalization - Feature engineering
↓
PostgreSQL Data Warehouse - Fact tables - Dimension tables
↓
Analytics Layer - Dash dashboards - Machine learning models
Languages - Python - SQL
Data & Analytics - Pandas - Scikit-learn - LightGBM - SQLAlchemy
Data Storage - PostgreSQL
Visualization - Plotly - Dash
Infrastructure - Docker - Docker Compose
The warehouse follows a star schema design to optimize analytical queries.
Table Description
FactTransacciones Financial transactions FactConsumoElectrico Energy consumption by device FactListaCompras Shopping lists and expenses FactPresupuesto Budget planning and tracking
Table Description
DimFecha Date dimension DimCuenta Bank accounts DimTarjeta Credit/Debit cards DimCategoria Transaction categories DimDispositivo Electrical devices
DataNest includes machine learning pipelines for automatic transaction categorization.
Models explored:
- Random Forest
- Gradient Boosting
- LightGBM
- Neural Networks (TensorFlow)
Example workflow:
- Transaction description preprocessing
- Feature extraction
- Model training
- Hyperparameter tuning
- Prediction integration into the analytics layer
The analytics layer is built using Plotly Dash.
Example dashboards include:
- Spending Monitoring
- Consumption Pattern Analysis
- Budget Control
- Energy Consumption by Device
These dashboards allow interactive filtering and exploration of the data warehouse.
DataNest demonstrates how personal data can be transformed into a powerful analytics environment.
Example insights:
- Monthly spending patterns
- Budget compliance tracking
- Device-level electricity consumption
- Transaction category predictions
Planned improvements include:
- automated data ingestion connectors
- streaming pipelines
- advanced forecasting models
- anomaly detection for financial transactions
- cloud deployment
This project is licensed under the Apache 2.0 License.
Gabriel León
Analytics Engineer