Add Iceberg tag time travel support#4211
Conversation
Add support for reading Iceberg tables using snapshot tags. This enables
time travel queries using Iceberg snapshot tags with the AT(ICEBERG_TAG)
syntax.
Changes:
- Add iceberg_tag field to TimeTravelConfig NamedTuple
- Update validate_and_normalize_params to handle iceberg_tag (only works
with time_travel_mode='at')
- Update generate_sql_clause to produce AT(ICEBERG_TAG => 'tag_name')
- Add iceberg_tag parameter to Table.__init__, Session.table(), and
DataFrameReader.table()
- Add TAG and ICEBERG_TAG options mapping in DataFrameReader
- Add unit tests for iceberg_tag time travel functionality
- Update AST proto with iceberg_tag field
Usage:
# Direct parameter
session.table("my_iceberg_table", time_travel_mode="at", iceberg_tag="v1")
# Via DataFrameReader option (Spark-compatible)
session.read.option("tag", "v1").table("my_iceberg_table")
Co-authored-by: Cursor <cursoragent@cursor.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4211 +/- ##
==========================================
- Coverage 95.42% 93.95% -1.47%
==========================================
Files 171 171
Lines 43835 43857 +22
Branches 7513 7520 +7
==========================================
- Hits 41829 41207 -622
- Misses 1226 1859 +633
- Partials 780 791 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| google.protobuf.StringValue time_travel_mode = 7; | ||
| Expr timestamp = 8; | ||
| google.protobuf.StringValue timestamp_type = 9; | ||
| google.protobuf.StringValue iceberg_tag = 10; |
There was a problem hiding this comment.
Is the client change generated from the monorepo AST definition?
I remember a client AST change isn't strictly required right now; if it's needed, it should be derived from the monorepo updates -- there is a step by step doc for the AST mono repo updates.
@sfc-gh-heshah what's your recommendation here? can we just not add the ast in the PR and do that later?
| elif self.stream is not None: | ||
| clause += f"(STREAM => '{self.stream}')" | ||
| elif self.iceberg_tag is not None: | ||
| clause += f"(ICEBERG_TAG => '{self.iceberg_tag}')" |
There was a problem hiding this comment.
I'm unable to see this feature in public doc.
what release status of this iceberg tag feature?
| "TIMESTAMP_TYPE": "timestamp_type", | ||
| "STREAM": "stream", | ||
| "ICEBERG_TAG": "iceberg_tag", | ||
| "TAG": "iceberg_tag", |
There was a problem hiding this comment.
When using tag as an alias for iceberg_tag, is that mapping defined within the Snowflake Iceberg spec or it's a spark spec.
snowflake has its own "tag" concept, will there be future conflict under the context of dataframe reader?
Add support for reading Iceberg tables using snapshot tags. This enables time travel queries using Iceberg snapshot tags with the AT(ICEBERG_TAG) syntax.
Changes:
Usage:
Direct parameter session.table("my_iceberg_table", time_travel_mode="at", iceberg_tag="v1")
Via DataFrameReader option (Spark-compatible) session.read.option("tag", "v1").table("my_iceberg_table")
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-NNNNNNN
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.