I build data tools that make complex systems legible — for the people who live and work inside them.
My work sits at the intersection of data engineering, machine learning, and civic accountability, with a focus on industries that shape South African life: mining, banking, and social and economic accountability.
| Project | What it does |
|---|---|
| SA Bank Trust Score | Scores South Africa's six major banks on trust, transparency & consumer fairness |
| Who Really Gets the Money? | A public interest analysis of R69.5B in SA development finance — inequality, ML prediction, and anomaly detection across IDC and NEF funding |
Languages & tools
Python SQL Streamlit Scikit-learn XGBoost Pandas Plotly
Infrastructure & DevOps
GitHub Actions Azure Pipelines Docker Azure Key Vault Bicep
Data sources I work with Open government data · SARB · Industry research · Public financial records
South Africa has no shortage of data. What it needs are people who can turn that data into something useful tools that give communities, consumers, and decision-makers. A clearer picture of the systems they're part of. That's the work I show up for.
- 🔭 Currently working on: A SQL + Tableau mining production intelligence dashboard using Stats SA data on Google BigQuery
- 🌱 Currently learning: SQL on Google BigQuery. Tableau Public. Cloud-based data engineering. AZ-400 DevOps Engineering (Microsoft certification in progress). Azure infrastructure as code with Bicep. Application monitoring with Azure Application Insights
- 👯 Open to collaborating on: Data science projects with real-world social or industrial impact — particularly South African crime, mining, public health, or music and culture analytics
- 🤝 Looking for help with: Becoming a full-stack data scientist
- ⚡ Fun fact: I love music, I'm passionate about social justice, and I have a very sharp sense of humour
| Project | Description | Live |
|---|---|---|
| 💎 Diamonds Are Forever | Full-stack ML study on the global diamond industry — price forecasting (R²=0.9997), 5 buyer archetypes, 73.8% lab-grown price collapse quantified | 🚀 Dashboard |
| ⛏️ Mining Quality Intelligence | Predicts silica quality failures in an iron ore plant hours before they occur — XGBoost + Neural Network on 737K sensor readings | 🚀 Dashboard |
| 🏦 SA Bank Trust Score | Data-driven consumer intelligence dashboard scoring South Africa's six major banks across complaints, regulatory sanctions and public sentiment — live multi-page Streamlit app | 🚀 Dashboard |
| 🔍 SA Crime Intelligence Report | Interrogates Q3 2025/26 SAPS crime stats against SAMRC femicide data — ARIMA forecasting, OLS regression, anomaly detection | 📓 Notebook |
| 🎵 Tyla Grammy Sentiment Analysis | NLP study of Tyla's historic Grammy triumph — 82% positive sentiment across 20+ countries via VADER + TextBlob dual-model validation | 📓 Notebook |
| Project | Description | Pipelines |
|---|---|---|
| 🏦 SA Bank Trust Score — DevOps Pipeline | Production-grade CI/CD pipeline on the SA Bank Trust Score app — 4 automated GitHub Actions workflows covering app health, data quality, scheduled freshness monitoring and Docker build checks | ⚙️ View Pipelines |
| 🎵 Tyla Sentiment Tracker | Automated fortnightly sentiment tracking pipeline — YouTube comments analysed with VADER & TextBlob on a scheduled GitHub Actions workflow, tracking public sentiment trends without manual intervention | ⚙️ View Pipelines |
Languages & Tools
Machine Learning & AI
Data & Visualisation
Deployment & DevOps
© 2026 Lindiwe Songelwa