Skip to content
View tarekmasryo's full-sized avatar

Block or report tarekmasryo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tarekmasryo/README.md

Tarek Masryo Banner

Typing SVG

AI/ML Engineer building production-ready ML and Generative AI systems across modeling, serving, evaluation, and monitoring.
From validated data pipelines → model evaluation → deployed APIs → decision-ready dashboards.

Kaggle Datasets Grandmaster Kaggle Notebooks Master

GitHub Website LinkedIn Kaggle

Hugging Face Streamlit Repos


🧭 What I build

Area What you can expect
Production ML systems Leak-safe evaluation, calibrated models, threshold policies, reproducible artifacts, and decision-ready outputs
Generative AI workflows RAG evaluation, retrieval attribution, tool-calling agents, structured outputs, and quality-aware rollout review
APIs & serving Dockerized FastAPI services, strict request/response schemas, versioned artifacts, and CI-friendly delivery
Monitoring & decision support Telemetry, observability, drift and risk review, latency and cost trade-offs, triage logic, and operational dashboards
Applied NLP & CV Text classification, semantic search, threshold tuning, explainability, and practical computer vision apps

🌟 Featured projects

Project Focus Links
Fraud Detection Dashboard Streamlit analytics UI + FastAPI inference + threshold tuning + cost-aware fraud review Repo
LLM Production Telemetry Multi-table telemetry → routing policies, drift review, triage thresholds, and decision artifacts Repo · Data
RAG QA Logs & Corpus Retrieval attribution, failure taxonomy, KPI baselines, trade-off analysis, and rollout gating Repo · Data
Road Accident Risk Residual-boosted ensemble, stable OOF evaluation, interpretable risk features, and calibrated safety scoring Kaggle
EV Charging Analytics Geospatial analytics, fast-DC allocation optimizer, market slices, and scenario-based network planning Repo
Cancer Risk Analysis Clean tabular data, validation, leakage-aware benchmarking, and interpretable risk modeling Repo · Data

🧠 Selected NLP & CV work

Project Focus Links
Advanced ML Sentiment Lab Streamlit + Plotly sentiment analysis lab with TF-IDF, ROC/PR evaluation, threshold tuning, error analysis, and live prediction Repo
Text Sentiment Analysis IMDB sentiment pipeline with calibrated TF-IDF baselines, threshold tuning, explainability, and BiLSTM baseline Repo
SMS Spam Detection Dual TF-IDF pipeline with calibrated Linear SVM, nested CV, threshold tuning, explainability, and robustness checks Repo
Old Photo Restorer Practical computer vision Gradio app for old-photo restoration with GFPGAN, optional upscaling, and batch ZIP export Repo

📦 Selected data products

Dataset What it’s for Links
Cancer Risk Factors Clean health, lifestyle, environmental, and genetic features for EDA and risk modeling Kaggle
Global EV Infrastructure Standardized EV charging infrastructure data for geospatial analytics, planning, and network modeling Kaggle
YouTube Shorts & TikTok Trends 2025 Short-form content analytics, trend exploration, and virality analysis Kaggle
Digital Lifestyle & Mental Wellness Behavioral signals for wellbeing analytics and predictive workflows Kaggle

🛠️ Tech stack

Category Tools
Languages & Core Python SQL C++ Bash Git Linux
Data & Analytics NumPy Pandas Polars DuckDB Jupyter
ML / DL scikit-learn XGBoost LightGBM PyTorch TensorFlow
NLP / CV / LLM Hugging Face Transformers OpenCV LangChain LlamaIndex LangGraph FAISS pgvector Ollama vLLM
Apps & Dashboards Streamlit Plotly Matplotlib Seaborn Gradio React PyDeck
APIs & Serving FastAPI Pydantic SQLAlchemy Alembic ONNX Docker Postgres Redis RQ
Monitoring & Quality MLflow OpenTelemetry Prometheus GitHub Actions pytest Ruff mypy

🤝 Collaboration

  • 🚀 Production ML & Generative AI systems: FastAPI services, Dockerized delivery, evaluation-first workflows, and decision-ready outputs
  • 🧠 RAG & LLM reliability: retrieval attribution, grounded outputs, guardrails, and regression-friendly review
  • 🗂️ Datasets & reproducible assets: validated schemas, clear documentation, reusable notebooks, and ML-ready data products
  • 📊 Decision-ready dashboards: interactive analytics, threshold tuning, monitoring, and operational insights

Best contact: LinkedIn

If you find the work useful, a ⭐ helps more people discover it.

Footer Banner

Pinned Loading

  1. fraud-detection-dashboard fraud-detection-dashboard Public

    Decision-first fraud screening: Streamlit analytics UI + FastAPI inference, artifact-locked RF/XGB models, and threshold policy controls.

    Python 5

  2. advanced-ml-sentiment-lab advanced-ml-sentiment-lab Public

    Advanced Streamlit + Plotly sentiment analysis lab: TF-IDF (word+char), multi-model training, ROC/PR AUC evaluation, cost-aware threshold tuning, error analysis, and live prediction.

    Python 8

  3. rag-qa-logs-corpus-data rag-qa-logs-corpus-data Public

    Synthetic multi-table RAG QA telemetry benchmark (corpus→chunks→retrieval→eval): labels for correctness/faithfulness/hallucination + cost/latency for RAG evaluation and dashboards.

    Python 2

  4. llm-system-ops-production-telemetry-sft-data llm-system-ops-production-telemetry-sft-data Public

    Production-grade synthetic dataset for LLMOps: interaction-level telemetry (latency/cost/tokens), failure RCA, tool-use analytics, user feedback, plus 1:1 aligned SFT samples.

    Python 1

  5. pima-diabetes-pipeline pima-diabetes-pipeline Public

    End-to-end diabetes risk prediction pipeline (Pima): EDA → feature engineering → calibration + cost-aware threshold → deployable artifacts.

    Jupyter Notebook 8

  6. tarekmasryo.github.io tarekmasryo.github.io Public

    Tarek Masryo — AI/ML Engineer Portfolio

    JavaScript 2