I’m an MS in Information Science ’26 student at UT Austin with 3 years of experience at FHLB Dallas and Accenture. I am having a strong interest in data engineering, cloud platforms, and AI/ML. I enjoy building end-to-end data systems that turn raw data into reliable, analytics-ready insights.
- MSIS ’26 at The University of Texas at Austin.
- Background in data engineering, cloud automation, and applied AI.
- Hands-on experience in Azure, Databricks, PySpark, SQL, Delta Lake, and streaming architectures.
- Interested in building scalable, production-style data pipelines, intelligent workflows and ML systems.
- Programming & Libraries: Python, Pandas, scikit-learn, PyTorch, SQL, Power Automate, Power Apps, Power Query
- Cloud & Data Engineering: PySpark, dbt, Azure Data Factory, Azure Data Lake Storage, Azure Event Hubs, Delta Live Tables, Lakeflow Jobs, Outsystems
- Data Visualization & Reporting Matplotlib, Seaborn, Power BI, Tableau, Excel
- Applied AI: Prompt Engineering, LLMs, RAG Pipelines, Vector Search, Semantic Similarity
- Tools: Git, VS Code, Jupyter Notebook, Visio, monday.com
-
movielens-dbt-elt-pipeline
Layered dbt project on Databricks transforming 20M+ MovieLens ratings into clean star-schema dimensions and facts for analytics, with robust data quality tests, SCD Type 2 snapshots for user tag history, seed-based movie enrichment, and interactive dbt docs showcasing end-to-end DAG lineage. -
RideStream Data Pipeline
Built an end-to-end Azure lakehouse pipeline for ride-hailing analytics, combining batch ingestion, real-time event streaming, Databricks transformations, and dimensional modeling into a silver OBT and gold star schema. -
SoundWave Azure Medallion Pipeline
Developed a medallion-architecture pipeline using Azure Data Factory and Databricks to ingest, transform, and organize music data for analytics and reporting. -
pi-level-rag-curriculum-mapper
Designed a multi-method NLP system that maps BSN nursing syllabi to AACN competency domains using LDA, NER, BioWordVec, BERT embeddings, FAISS, and PI-level RAG to deliver interpretable, evidence-backed curriculum alignment.
The full code for these projects is available in the pinned repositories below for a detailed view of the implementation.
I’m interested in roles in: Data Engineering, Analytics Engineering, Cloud Data Platforms, AI / ML Engineering, Applied Data Science
- LinkedIn: (https://www.linkedin.com/in/sai-nikhil-p/)
- Email: (csainikhil123@gmail.com)
- GitHub: (https://github.com/sainikhilp)
