Skip to content

Gaboelc/Datanest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataNest

Personal Data Warehouse & Analytics Platform for Financial and Lifestyle Data

DataNest is a personal data platform designed to collect, transform and analyze financial and lifestyle data using a modern analytics engineering stack.

The project implements a data warehouse architecture, automated ETL pipelines, machine learning models for transaction classification, and interactive analytics dashboards.

Its goal is to demonstrate how a small-scale personal data ecosystem can replicate many principles used in modern data platforms.


Overview

DataNest centralizes personal data such as:

  • financial transactions
  • energy consumption
  • budgets
  • shopping data
  • device usage

The platform processes raw exports (CSV/JSON) and transforms them into a structured star schema data warehouse, enabling advanced analytics and machine learning workflows.

Key capabilities include:

  • Automated ingestion pipelines
  • Data modeling with fact and dimension tables
  • Transaction categorization using machine learning
  • Interactive dashboards for personal analytics
  • Energy consumption monitoring by device and location

Architecture

The platform follows a simplified modern data stack architecture.

Data Sources - Bank CSV exports - Energy consumption logs - Device activity logs - Shopping data

ETL Pipelines (Python) - Data cleaning - Data normalization - Feature engineering

PostgreSQL Data Warehouse - Fact tables - Dimension tables

Analytics Layer - Dash dashboards - Machine learning models


Tech Stack

Languages - Python - SQL

Data & Analytics - Pandas - Scikit-learn - LightGBM - SQLAlchemy

Data Storage - PostgreSQL

Visualization - Plotly - Dash

Infrastructure - Docker - Docker Compose


Data Warehouse Design

The warehouse follows a star schema design to optimize analytical queries.

Fact Tables

Table Description


FactTransacciones Financial transactions FactConsumoElectrico Energy consumption by device FactListaCompras Shopping lists and expenses FactPresupuesto Budget planning and tracking

Dimension Tables

Table Description


DimFecha Date dimension DimCuenta Bank accounts DimTarjeta Credit/Debit cards DimCategoria Transaction categories DimDispositivo Electrical devices


Machine Learning

DataNest includes machine learning pipelines for automatic transaction categorization.

Models explored:

  • Random Forest
  • Gradient Boosting
  • LightGBM
  • Neural Networks (TensorFlow)

Example workflow:

  1. Transaction description preprocessing
  2. Feature extraction
  3. Model training
  4. Hyperparameter tuning
  5. Prediction integration into the analytics layer

Dashboards

The analytics layer is built using Plotly Dash.

Example dashboards include:

  • Spending Monitoring
  • Consumption Pattern Analysis
  • Budget Control
  • Energy Consumption by Device

These dashboards allow interactive filtering and exploration of the data warehouse.


Use Cases

DataNest demonstrates how personal data can be transformed into a powerful analytics environment.

Example insights:

  • Monthly spending patterns
  • Budget compliance tracking
  • Device-level electricity consumption
  • Transaction category predictions

Future Improvements

Planned improvements include:

  • automated data ingestion connectors
  • streaming pipelines
  • advanced forecasting models
  • anomaly detection for financial transactions
  • cloud deployment

License

This project is licensed under the Apache 2.0 License.


Author

Gabriel León
Analytics Engineer

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors