-
Notifications
You must be signed in to change notification settings - Fork 757
[GH-2886] Recognize Box2D columns as GeoParquet bbox covering columns #2921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1021,6 +1021,53 @@ class geoparquetIOTests extends TestBaseScala with BeforeAndAfterAll { | |
| } | ||
| } | ||
|
|
||
| it("GeoParquet supports writing covering metadata from a Box2D column") { | ||
| // User-provided Box2D column referenced via the geoparquet.covering option. | ||
| val df = sparkSession | ||
| .range(0, 100) | ||
| .toDF("id") | ||
| .withColumn("id", expr("CAST(id AS DOUBLE)")) | ||
| .withColumn("geometry", expr("ST_Point(id, id + 1)")) | ||
| .withColumn("test_cov", expr("ST_Box2D(geometry)")) | ||
| val geoParquetSavePath = geoparquetoutputlocation + "/gp_with_box2d_covering.parquet" | ||
| df.write | ||
| .format("geoparquet") | ||
| .option("geoparquet.covering.geometry", "test_cov") | ||
| .mode("overwrite") | ||
| .save(geoParquetSavePath) | ||
|
Comment on lines
+1032
to
+1037
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The hard-coded path under |
||
| validateGeoParquetMetadata(geoParquetSavePath) { geo => | ||
| implicit val formats: org.json4s.Formats = org.json4s.DefaultFormats | ||
| val coveringJsValue = geo \ "columns" \ "geometry" \ "covering" | ||
| val covering = coveringJsValue.extract[Covering] | ||
| assert(covering.bbox.xmin == Seq("test_cov", "xmin")) | ||
| assert(covering.bbox.ymin == Seq("test_cov", "ymin")) | ||
| assert(covering.bbox.xmax == Seq("test_cov", "xmax")) | ||
| assert(covering.bbox.ymax == Seq("test_cov", "ymax")) | ||
| } | ||
| } | ||
|
|
||
| it("GeoParquet auto populates covering metadata for a Box2D <geom>_bbox column") { | ||
| // Auto-detect path: when a column named <geom>_bbox is a Box2D, reuse it as the | ||
| // covering column instead of synthesizing a separate float64 struct. | ||
| val df = sparkSession | ||
| .range(0, 100) | ||
| .toDF("id") | ||
| .withColumn("id", expr("CAST(id AS DOUBLE)")) | ||
| .withColumn("geometry", expr("ST_Point(id, id + 1)")) | ||
| .withColumn("geometry_bbox", expr("ST_Box2D(geometry)")) | ||
| val geoParquetSavePath = geoparquetoutputlocation + "/gp_box2d_auto_covering.parquet" | ||
| df.write.format("geoparquet").mode("overwrite").save(geoParquetSavePath) | ||
|
Comment on lines
+1058
to
+1059
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as the previous comment — sticking with the existing hard-coded |
||
| validateGeoParquetMetadata(geoParquetSavePath) { geo => | ||
| implicit val formats: org.json4s.Formats = org.json4s.DefaultFormats | ||
| val coveringJsValue = geo \ "columns" \ "geometry" \ "covering" | ||
| val covering = coveringJsValue.extract[Covering] | ||
| assert(covering.bbox.xmin == Seq("geometry_bbox", "xmin")) | ||
| assert(covering.bbox.ymin == Seq("geometry_bbox", "ymin")) | ||
| assert(covering.bbox.xmax == Seq("geometry_bbox", "xmax")) | ||
| assert(covering.bbox.ymax == Seq("geometry_bbox", "ymax")) | ||
| } | ||
| } | ||
|
|
||
| it("GeoParquet auto populates covering metadata for single geometry column") { | ||
| val df = sparkSession | ||
| .range(0, 100) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 776c5b9 — bound the case to
udt: Box2DUDTand useudt.sqlTypewith amatchonStructType(no allocation, noasInstanceOf, clear failure mode if the sqlType shape ever changes).On the second suggestion (generalize to any
UserDefinedTypewhosesqlTypeis aStructType): I would push back on that. Other UDTs may have struct-shapedsqlTypes that are not valid bbox covering columns — e.g., a future ImageUDT or RasterMetadataUDT could have a struct of metadata fields, and the call tovalidateField("xmin")insidecoveringColumnTypeToCoveringwould throw a confusing error instead of the clear "not a struct type" one. Keeping the match specific toBox2DUDTmakes the contract explicit; we can add other UDTs case-by-case as they appear.