Skip to content

Convert GTFS RT Day Map Grouping Models to Microbatch#5009

Merged
lauriemerrell merged 8 commits intomainfrom
4880-rt-grouping-microbatch
Apr 8, 2026
Merged

Convert GTFS RT Day Map Grouping Models to Microbatch#5009
lauriemerrell merged 8 commits intomainfrom
4880-rt-grouping-microbatch

Conversation

@lauriemerrell
Copy link
Copy Markdown
Contributor

@lauriemerrell lauriemerrell commented Mar 31, 2026

Description

Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies.

Resolves #4880

  • Converts the GTFS-RT trip/day/map grouping and related models to microbatch strategy
  • Makes the GTFS RT start date variable conditional based on what environment you're in, to avoid rerunning a ton of history outside of prod
  • Fixes the docs for int_gtfs_rt__vehicle_positions_trip_stop_day_map_grouping which got messed up

Note: I reviewed this comment: #4880 (comment) and I don't think that that model actually requires anything different. The lookback here should always get full coverage for each date.

See: #4865 (comment) -- additional follow up will be needed for the trip_stop_day_map_grouping models, scheduled a meeting for next Monday.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Include commands/logs/screenshots as relevant.

If making changes to dbt models, make sure they were created or update on Staging. Please run the command uv run dbt run -s CHANGED_MODEL --target staging and uv run dbt test -s CHANGED_MODEL --target staging, then include the output in this section of the PR.

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)
  • Delete the int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig model since it is no longer needed

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

Terraform plan in iac/cal-itp-data-infra-staging/composer/us

No changes. Your infrastructure matches the configuration.
No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #1783

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

Terraform plan in iac/cal-itp-data-infra/airflow/us

Plan: 0 to add, 9 to change, 1 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~  update in-place
-   destroy

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-composer-catalog will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-catalog" {
!~      content             = (sensitive value)
!~      crc32c              = "jhMH+Q==" -> (known after apply)
!~      detect_md5hash      = "qtXThqXroNOKCzfI4zVD+w==" -> "different hash"
!~      generation          = 1775592803392195 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/target/catalog.json"
!~      md5hash             = "qtXThqXroNOKCzfI4zVD+w==" -> (known after apply)
        name                = "data/warehouse/target/catalog.json"
#        (16 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["dbt_project.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "Ld463A==" -> (known after apply)
!~      detect_md5hash      = "269m3s3Kkyg8A0xb+xzBoQ==" -> "different hash"
!~      generation          = 1775253846706729 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/dbt_project.yml"
!~      md5hash             = "269m3s3Kkyg8A0xb+xzBoQ==" -> (known after apply)
        name                = "data/warehouse/dbt_project.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/_int_gtfs.yaml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "TsZB1w==" -> (known after apply)
!~      detect_md5hash      = "fds7Vt0nT/l3yyrIHlrJ3Q==" -> "different hash"
!~      generation          = 1773704938589194 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/intermediate/gtfs/_int_gtfs.yaml"
!~      md5hash             = "fds7Vt0nT/l3yyrIHlrJ3Q==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/_int_gtfs.yaml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/int_gtfs_rt__service_alerts_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "mnwW3A==" -> (known after apply)
!~      detect_md5hash      = "GzLGh5rkqTYmwcv7bqASvg==" -> "different hash"
!~      generation          = 1751416668483719 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_day_map_grouping.sql"
!~      md5hash             = "GzLGh5rkqTYmwcv7bqASvg==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/int_gtfs_rt__service_alerts_trip_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "9fHK4A==" -> (known after apply)
!~      detect_md5hash      = "mDb58oK954Sbpbg2S3Ijvg==" -> "different hash"
!~      generation          = 1751416666591665 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_trip_day_map_grouping.sql"
!~      md5hash             = "mDb58oK954Sbpbg2S3Ijvg==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_trip_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "IHcdjA==" -> (known after apply)
!~      detect_md5hash      = "oPPbxAkvYjwC9kA28z2AyQ==" -> "different hash"
!~      generation          = 1754523290751485 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_day_map_grouping.sql"
!~      md5hash             = "oPPbxAkvYjwC9kA28z2AyQ==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_stop_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "hWCcAw==" -> (known after apply)
!~      detect_md5hash      = "2LbPWRsgLo7USH5eff6hzw==" -> "different hash"
!~      generation          = 1766020786749375 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_stop_day_map_grouping.sql"
!~      md5hash             = "2LbPWRsgLo7USH5eff6hzw==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_stop_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql"] will be destroyed
  # (because key ["models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql"] is not in for_each map)
-   resource "google_storage_bucket_object" "calitp-composer-dags" {
-       bucket              = "calitp-composer" -> null
-       content_type        = "text/plain; charset=utf-8" -> null
-       crc32c              = "eVEnww==" -> null
-       detect_md5hash      = "XaA8d7tbWPWp1dYXRGXK2w==" -> null
-       event_based_hold    = false -> null
-       generation          = 1773439453844388 -> null
-       id                  = "calitp-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       md5hash             = "XaA8d7tbWPWp1dYXRGXK2w==" -> null
-       md5hexhash          = "5da03c77bb5b58f5a9d5d6174465cadb" -> null
-       media_link          = "https://storage.googleapis.com/download/storage/v1/b/calitp-composer/o/data%2Fwarehouse%2Fmodels%2Fintermediate%2Fgtfs%2Fint_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql?generation=1773439453844388&alt=media" -> null
-       metadata            = {} -> null
-       name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       output_name         = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       self_link           = "https://www.googleapis.com/storage/v1/b/calitp-composer/o/data%2Fwarehouse%2Fmodels%2Fintermediate%2Fgtfs%2Fint_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       source              = "../../../../warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       storage_class       = "STANDARD" -> null
-       temporary_hold      = false -> null
#        (6 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs/_mart_gtfs_fcts.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "P4eAwA==" -> (known after apply)
!~      detect_md5hash      = "2MPto/410bTo191rzm8sNw==" -> "different hash"
!~      generation          = 1773704940461014 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/mart/gtfs/_mart_gtfs_fcts.yml"
!~      md5hash             = "2MPto/410bTo191rzm8sNw==" -> (known after apply)
        name                = "data/warehouse/models/mart/gtfs/_mart_gtfs_fcts.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-manifest will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-manifest" {
!~      content             = (sensitive value)
!~      crc32c              = "7jidww==" -> (known after apply)
!~      detect_md5hash      = "6FW6YAnPh2liSL9ZNCUG2w==" -> "different hash"
!~      generation          = 1775592804790347 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/target/manifest.json"
!~      md5hash             = "6FW6YAnPh2liSL9ZNCUG2w==" -> (known after apply)
        name                = "data/warehouse/target/manifest.json"
#        (16 unchanged attributes hidden)
    }

Plan: 0 to add, 9 to change, 1 to destroy.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #1783

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

Terraform plan in iac/cal-itp-data-infra-staging/airflow/us

Plan: 0 to add, 7 to change, 1 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~  update in-place
-   destroy

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-staging-composer-dags["dbt_project.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "Ld463A==" -> (known after apply)
!~      detect_md5hash      = "269m3s3Kkyg8A0xb+xzBoQ==" -> "different hash"
!~      generation          = 1775253844571559 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/dbt_project.yml"
!~      md5hash             = "269m3s3Kkyg8A0xb+xzBoQ==" -> (known after apply)
        name                = "data/warehouse/dbt_project.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/intermediate/gtfs/_int_gtfs.yaml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "TsZB1w==" -> (known after apply)
!~      detect_md5hash      = "fds7Vt0nT/l3yyrIHlrJ3Q==" -> "different hash"
!~      generation          = 1773700029900596 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/intermediate/gtfs/_int_gtfs.yaml"
!~      md5hash             = "fds7Vt0nT/l3yyrIHlrJ3Q==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/_int_gtfs.yaml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/intermediate/gtfs/int_gtfs_rt__service_alerts_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "mnwW3A==" -> (known after apply)
!~      detect_md5hash      = "GzLGh5rkqTYmwcv7bqASvg==" -> "different hash"
!~      generation          = 1749663113691683 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_day_map_grouping.sql"
!~      md5hash             = "GzLGh5rkqTYmwcv7bqASvg==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/intermediate/gtfs/int_gtfs_rt__service_alerts_trip_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "9fHK4A==" -> (known after apply)
!~      detect_md5hash      = "mDb58oK954Sbpbg2S3Ijvg==" -> "different hash"
!~      generation          = 1749663114665236 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_trip_day_map_grouping.sql"
!~      md5hash             = "mDb58oK954Sbpbg2S3Ijvg==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__service_alerts_trip_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "IHcdjA==" -> (known after apply)
!~      detect_md5hash      = "oPPbxAkvYjwC9kA28z2AyQ==" -> "different hash"
!~      generation          = 1754523275792214 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_day_map_grouping.sql"
!~      md5hash             = "oPPbxAkvYjwC9kA28z2AyQ==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_stop_day_map_grouping.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "hWCcAw==" -> (known after apply)
!~      detect_md5hash      = "2LbPWRsgLo7USH5eff6hzw==" -> "different hash"
!~      generation          = 1766020789279275 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_stop_day_map_grouping.sql"
!~      md5hash             = "2LbPWRsgLo7USH5eff6hzw==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__trip_updates_trip_stop_day_map_grouping.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql"] will be destroyed
  # (because key ["models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql"] is not in for_each map)
-   resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
-       bucket              = "calitp-staging-composer" -> null
-       content_type        = "text/plain; charset=utf-8" -> null
-       crc32c              = "eVEnww==" -> null
-       detect_md5hash      = "XaA8d7tbWPWp1dYXRGXK2w==" -> null
-       event_based_hold    = false -> null
-       generation          = 1773439452516391 -> null
-       id                  = "calitp-staging-composer-data/warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       md5hash             = "XaA8d7tbWPWp1dYXRGXK2w==" -> null
-       md5hexhash          = "5da03c77bb5b58f5a9d5d6174465cadb" -> null
-       media_link          = "https://storage.googleapis.com/download/storage/v1/b/calitp-staging-composer/o/data%2Fwarehouse%2Fmodels%2Fintermediate%2Fgtfs%2Fint_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql?generation=1773439452516391&alt=media" -> null
-       metadata            = {} -> null
-       name                = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       output_name         = "data/warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       self_link           = "https://www.googleapis.com/storage/v1/b/calitp-staging-composer/o/data%2Fwarehouse%2Fmodels%2Fintermediate%2Fgtfs%2Fint_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       source              = "../../../../warehouse/models/intermediate/gtfs/int_gtfs_rt__vehicle_positions_trip_day_map_grouping_orig.sql" -> null
-       storage_class       = "STANDARD" -> null
-       temporary_hold      = false -> null
#        (6 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/mart/gtfs/_mart_gtfs_fcts.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "P4eAwA==" -> (known after apply)
!~      detect_md5hash      = "2MPto/410bTo191rzm8sNw==" -> "different hash"
!~      generation          = 1773700031274807 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/mart/gtfs/_mart_gtfs_fcts.yml"
!~      md5hash             = "2MPto/410bTo191rzm8sNw==" -> (known after apply)
        name                = "data/warehouse/models/mart/gtfs/_mart_gtfs_fcts.yml"
#        (17 unchanged attributes hidden)
    }

Plan: 0 to add, 7 to change, 1 to destroy.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #1783

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For modified incremental models (or incremental models whose parents are modified), does the PR description identify whether a full refresh is needed for these tables?

Changed incremental models 🔀

calitp_warehouse.mart.gtfs.fct_observed_trips

calitp_warehouse.mart.gtfs.fct_service_alerts_messages_unnested

calitp_warehouse.mart.gtfs.fct_stop_time_metrics

calitp_warehouse.mart.gtfs.fct_trip_updates_stop_metrics

calitp_warehouse.mart.gtfs.fct_trip_updates_trip_metrics

calitp_warehouse.mart.gtfs.fct_trip_updates_trip_summaries

calitp_warehouse.mart.gtfs.fct_vehicle_locations

calitp_warehouse.mart.gtfs.fct_vehicle_locations_path

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__service_alerts_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__service_alerts_trip_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__trip_updates_trip_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__trip_updates_trip_stop_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__vehicle_positions_trip_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__vehicle_positions_trip_stop_day_map_grouping

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@lauriemerrell
Copy link
Copy Markdown
Contributor Author

lauriemerrell commented Apr 1, 2026

@tiffanychu90 what is the intention for int_gtfs_rt__trip_updates_trip_stop_day_map_grouping -- I see it's part of dbt_manual -- are we ok converting it to microbatch? If so, what should its lookback period be? If it's set to the dbt_all lookback period, it will only run 5 days of history by default. This can be overridden by passing --event-time-start and --event-time-end arguments to process a specific range.

It looks like this guy processes ~1 TB per day 😅

image

Edit: Deferring this until after Monday meeting

@lauriemerrell lauriemerrell merged commit 2437ff5 into main Apr 8, 2026
19 checks passed
@lauriemerrell lauriemerrell deleted the 4880-rt-grouping-microbatch branch April 8, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate GTFS RT day / trip grouping models to microbatch strategy

2 participants