Skip to content
View sainikhilp's full-sized avatar

Block or report sainikhilp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sainikhilp/README.md

Hi, I'm Sai Nikhil Pillai 👋

I’m an MS in Information Science ’26 student at UT Austin with 3 years of experience at FHLB Dallas and Accenture. I am having a strong interest in data engineering, cloud platforms, and AI/ML. I enjoy building end-to-end data systems that turn raw data into reliable, analytics-ready insights.

About Me

  • MSIS ’26 at The University of Texas at Austin.
  • Background in data engineering, cloud automation, and applied AI.
  • Hands-on experience in Azure, Databricks, PySpark, SQL, Delta Lake, and streaming architectures.
  • Interested in building scalable, production-style data pipelines, intelligent workflows and ML systems.

Technical Skills

  • Programming & Libraries: Python, Pandas, scikit-learn, PyTorch, SQL, Power Automate, Power Apps, Power Query
  • Cloud & Data Engineering: PySpark, dbt, Azure Data Factory, Azure Data Lake Storage, Azure Event Hubs, Delta Live Tables, Lakeflow Jobs, Outsystems
  • Data Visualization & Reporting Matplotlib, Seaborn, Power BI, Tableau, Excel
  • Applied AI: Prompt Engineering, LLMs, RAG Pipelines, Vector Search, Semantic Similarity
  • Tools: Git, VS Code, Jupyter Notebook, Visio, monday.com

Projects

  • movielens-dbt-elt-pipeline
    Layered dbt project on Databricks transforming 20M+ MovieLens ratings into clean star-schema dimensions and facts for analytics, with robust data quality tests, SCD Type 2 snapshots for user tag history, seed-based movie enrichment, and interactive dbt docs showcasing end-to-end DAG lineage.

  • RideStream Data Pipeline
    Built an end-to-end Azure lakehouse pipeline for ride-hailing analytics, combining batch ingestion, real-time event streaming, Databricks transformations, and dimensional modeling into a silver OBT and gold star schema.

  • SoundWave Azure Medallion Pipeline
    Developed a medallion-architecture pipeline using Azure Data Factory and Databricks to ingest, transform, and organize music data for analytics and reporting.

  • pi-level-rag-curriculum-mapper
    Designed a multi-method NLP system that maps BSN nursing syllabi to AACN competency domains using LDA, NER, BioWordVec, BERT embeddings, FAISS, and PI-level RAG to deliver interpretable, evidence-backed curriculum alignment.

The full code for these projects is available in the pinned repositories below for a detailed view of the implementation.

Career Interests

I’m interested in roles in: Data Engineering, Analytics Engineering, Cloud Data Platforms, AI / ML Engineering, Applied Data Science

Contact

Pinned Loading

  1. movielens-dbt-elt-pipeline movielens-dbt-elt-pipeline Public

    Layered dbt project on Databricks transforming 20M+ MovieLens ratings into clean star-schema dimensions and facts for analytics, with robust data quality tests, SCD Type 2 snapshots for user tag hi…

    Jupyter Notebook

  2. ridestream-data-pipeline ridestream-data-pipeline Public

    RideStream is a scalable Azure-based lakehouse project designed for ride-hailing analytics. It combines batch ingestion from HTTP/internal sources with streaming booking events from Event Hubs, pro…

    Python

  3. spotify-azure-medallion-pipeline spotify-azure-medallion-pipeline Public

    This project implements an end‑to‑end medallion architecture on Azure using metadata‑driven pipelines for incremental ingestion, streaming transformations in Databricks, and a Type 2 SCD gold layer…

    Python

  4. pi-level-rag-curriculum-mapper pi-level-rag-curriculum-mapper Public

    This repository contains an end-to-end NLP project that automates curriculum mapping for the University of Texas at Austin School of Nursing. The goal is to map BSN course syllabi to the American A…

    Jupyter Notebook