Better bucket regexes#259
Merged
TheTechromancer merged 100 commits intodevfrom May 4, 2026
Merged
Conversation
Dev -> Stable 9.1.0
Dev -> Stable 9.1.0
Dev -> Stable 9.1.2
Dev -> Stable 9.1.3
Dev -> Stable 9.2.0
Readme updates + daily sig update
Better publishing
merge using pr
better workflow
just push stable
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switch storage bucket regexes from positional capture groups to named groups, and surface region info where the hostname carries it.
Previously, every
STORAGE_BUCKET_HOSTNAMEregex used a fixed contract: group 1 = bucket name, group 2 = trailing domain. That breaks down once some providers expose region in the URL and others don't — consumers can't tell which group means what without per-provider logic. With named groups, consumers just callmatch.groupdict()and readname/region(the latter only present when meaningful).This is a breaking change for downstream consumers indexing by group number, so the artifact is published as
cloud_providers_v3.jsonand the package version is bumped to10.0.0.cloud_providers_v2.jsonis left in place, frozen, so legacy consumers keep resolving until they migrate.Changes
(?P<name>…)(and(?P<region>…)where applicable):bucket.s3.amazonaws.com), dash-style (bucket.s3-us-west-2.amazonaws.com), and dot-style (bucket.s3.eu-central-1.amazonaws.com).nyc3,sfo2, …) broken out of the trailing domain.region(fsn1,nbg1,hel1); the previous regex matched the wrong URL shape.name-only (no region in their bucket URLs).cloud_providers_v2.json→cloud_providers_v3.jsonacrosscloudcheck_update,src/lib.rs(signature URL + cache path),scripts/update_readme_table.py, the daily-update workflow, and the README.pyproject.tomlandCargo.toml→10.0.0.test_cloudcheck.pyexercising real hostnames per provider and verifyinggroupdict(), plus a sweep that asserts every provider's regexes compile and expose anamegroup.Test plan
uv run pytest test_cloudcheck.py -k "regex or import_provider"— the new + existing regex tests pass.uv run python -m cloudcheck_update.cli— regeneratescloud_providers_v3.jsonwith the new regex strings round-tripped intact.cloud_providers_v3.jsontostable; the network-dependenttest_lookup_*tests resume passing once the file is live at the new URL.Relevant: