Skip to content

vishal-labade/nyc_tlc_github

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYC TLC Demand Forecasting — Product Data Science Case Study

This repository presents a product-oriented demand forecasting system built on NYC TLC FHVHV trip data. The focus is not just modeling accuracy, but decision-ready forecasts, clear evaluation, and product-relevant tradeoffs.

Please refer to read outs for details on EDA or Modeling or Data Engineering.

For Infra set up please review repo for Spark-Iceberg-MinIO set up.

1. Product Framing

Product question:

How much ride-hailing demand should we expect by area and time, and how reliable are those forecasts for operational decisions?

This project treats forecasting as a decision support problem, not a pure ML exercise.

Key product considerations:

  • Forecasts must be stable, interpretable, and horizon-aware
  • Evaluation must reflect how forecasts are consumed
  • Models must support multiple planning cadences (daily vs hourly)

2. End-to-End Analytical Flow


Raw Trip Events
        ↓
Validated & Cleaned Trips
        ↓
Time Alignment (Date / Hour)
        ↓
Spatial Hierarchy
  (Zone → Borough → Cluster)
        ↓
Weather Context
  (Rain / Snow / Temperature)
        ↓
Aggregated Demand Signals
        ↓
Decision-Ready Model Inputs
   ├─ Daily Planning (Prophet)
   └─ Intraday Ops (LightGBM)
        ↓
Forecasts + OOS Scorecards

3. Repository Structure

.
├── src/
│   └── nyc_tlc/
│       ├── etl/
│       │   ├── populate_basedata_base.py
│       │   ├── populate_basedata_rog.py
│       │   ├── populate_daily_summary.py
│       │   
│       │
│       ├── helpers/
│       │   ├── interactive_maps.py
│       │   └── static_maps.py
│       │
│       ├── model_pipeline/
│       │   |── daily_borough_prophet.py
│       │   ├── daily_cluster_prophet.py
│       │   ├── hourly_cluster_gbm_cv.py
│       │   ├── hourly_cluster_gbm_ff.py
│       │
│       └── utils/
│           ├── extract_zone_weather.py
│           ├── loaders.py
│           ├── weather_downloads.py
│           └── weather2.py
│
├── notebooks/
│   ├── exploration/    -- Basic Trends, Spatial Demand Analysis, Trip Metrics ,Fare Economics/Surge, Weather
│   ├── models_prototyping/       -- Model building and pipeline prototyping
│   ├── models_evals/             -- Model Evaluation and OOS Testing
│   └── weather_spatial_data/     -- Notebook to download and Append weather based on zone centroids.
|
|
├── docs/                -- All .md format files for readouts
│   ├── basic_trends/             -- Basic Demand Trends
│   ├── data_engineering/         -- End to End date journey with detailed data flow
│   ├── fares_pricing/            -- Fare Economics and Surge Index
│   ├── modeling/                 -- Prophet and LightGBM models
│   ├── one_pager/                -- EDA One Pager
│   ├── spatial_demand_analysis/  -- Analyze Trip flows across Manhattan
│   ├── trip_metrics/             -- Trip Distance, Trip Duration and Trip Speed Analysis
│   └── weather_effects/          -- Impact of Precipitation and Snowfall
│
│
├── readouts/       -- PDF docs meant for internal readout and reviews
│   ├── data_engg/                -- Data Engineering Read out.
│   ├── eda/                      -- Exploratory Data Analysis and Results
│   └── modeling/                 -- Read outs for Daily and Hourly Models
|
│
└── README.md

4. Data Organization (Conceptual)

raw/            → immutable trip events
reference/      → zones, clusters, weather
processed/      → cleaned & enriched trips
model_inputs/   → forecast-ready aggregates
forecasts/      → predictions + evaluation

5. Modeling Strategy (Product-Driven)

Daily Forecasts — Prophet

Used for: capacity planning, staffing, and medium-term trend visibility

  • Granularity: Borough and Cluster

  • Strengths:

    • Interpretable trends and seasonality
    • Stable multi-week forecasts
  • Tradeoff:

    • Lower responsiveness to sudden intra-day shocks

Evaluation:

  • Rolling cross-validation
  • Horizon-specific error (7 / 14 / 28 days)
  • Metrics reported in product-meaningful units (MAPE / WAPE)

Hourly Forecasts — LightGBM

Used for: intraday operations and near-term adjustments

  • Granularity: Cluster × Hour

  • Signals:

    • Lagged demand
    • Rolling demand context
    • Weather conditions
    • Calendar effects
  • Strengths:

    • High short-term accuracy
    • Better reaction to transient demand changes
  • Tradeoff:

    • Less interpretable than additive time-series models

Evaluation:

  • Short horizons (1–24h, 1–48h)
  • Trip-weighted errors to reflect real impact

6. Evaluation Philosophy (Product-First)

This project intentionally avoids “single aggregate accuracy.”

Instead:

  • Metrics are computed at fixed, decision-relevant horizons
  • Errors are weighted by trip volume
  • Out-of-sample periods are clearly separated and reported

This mirrors how forecasts are actually reviewed in product, ops, and planning forums.

7. Key Product Insights Enabled

  • Where demand is predictable vs inherently volatile
  • How weather systematically shifts demand distribution
  • When daily forecasts are sufficient vs when hourly models add value
  • Tradeoffs between forecast stability and responsiveness

8. What This Demonstrates as a Product Data Scientist

This repository showcases:

  • Translating ambiguous product questions into measurable models
  • Designing features that reflect real user and marketplace behavior
  • Evaluating models the way decisions are made, not the way libraries default
  • Communicating model limitations and tradeoffs clearly

9. Status

✅ EDA complete with product-relevant hypotheses

✅ Feature engineering finalized

✅ Daily (Prophet) and Hourly (LightGBM) models finalized

✅ Out-of-sample scorecards complete

10. Next Enhancements (Explicitly Product-Scoped)

  • Pricing and surge sensitivity modeling
  • ETA prediction as a downstream consumer metric
  • Scenario simulations (weather, holidays, shocks)

License

This project is licensed under the MIT License.

About

Deep exploratory analysis of NYC TLC trip data to understand demand patterns, zone-level variability, seasonality, and revenue distribution. Conducted structured EDA on spatial heterogeneity, temporal trends, skew, feature correlations, and lag effects. Built Prophet and LightGBM models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors