Skip to content

[GH-2927] Add geometry ↔ Box2D Catalyst cast#2952

Merged
jiayuasu merged 4 commits into
apache:masterfrom
jiayuasu:feature/geometry-box2d-cast
May 15, 2026
Merged

[GH-2927] Add geometry ↔ Box2D Catalyst cast#2952
jiayuasu merged 4 commits into
apache:masterfrom
jiayuasu:feature/geometry-box2d-cast

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented May 13, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Spark's Cast.canCast rejects arbitrary UDT-to-UDT casts, so today users have to call ST_Box2D(geom) / ST_GeomFromBox2D(box) explicitly. This PR teaches Catalyst to handle the conversions as ordinary SQL casts:

  • SQL CAST(geom AS box2d) and CAST(box AS geometry) parse and execute.
  • DataFrame col.cast(Box2DUDT) and col.cast(GeometryUDT()) work the same way.

How it works

A new analyzer rule Box2DCastResolutionRule rewrites Cast nodes during resolution, before CheckAnalysis runs:

Cast Rewritten to
Cast(geom, Box2DUDT) ST_Box2D(geom)
Cast(box, GeometryUDT) ST_GeomFromBox2D(box)

Because the rewrite happens before CheckAnalysis, Catalyst never observes the rejected Cast — the downstream optimizer and codegen path see the same expression tree that the explicit ST_Box2D(geom) / ST_GeomFromBox2D(box) constructs already produce. No new code-gen path, no canCast change, no extra null/eval handling.

The rule is registered via SparkSessionExtensions.injectResolutionRule from SedonaSqlExtensions, so it activates whenever users wire Sedona through spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions (the standard install).

SQL type keyword

SedonaSqlAstBuilder.visitPrimitiveDataType already recognized GEOMETRY as a type keyword. This PR adds BOX2D alongside it, across all supported Spark versions (3.4 / 3.5 / 4.0 / 4.1). Without it, the SQL parser would reject CAST(... AS box2d) before any analyzer rule could fire.

Scope notes

Implicit type coercion in function dispatch (e.g. passing a Geometry directly into a Box2D-typed function argument without an explicit cast) is intentionally out of scope here — it requires hooking into Catalyst's type coercion rules and is tracked as a follow-up.

How was this patch tested?

  • Box2DCastResolutionRuleSuite (spark/common): unit test of the rewrite for both directions, plus a no-op assertion for unrelated casts.
  • Box2DCastSuite (spark-3.5): end-to-end tests covering SQL CAST (both directions), DataFrame .cast(Box2DUDT) / .cast(GeometryUDT()), the round-trip Geometry → Box2D → Geometry, and NULL propagation.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public SQL API documentation surface in isolation. The new cast syntax is covered by the consolidated Phase 1+2+3 Box2D docs update.

jiayuasu added 3 commits May 12, 2026 23:33
Spark's `Cast.canCast` rejects arbitrary UDT-to-UDT casts, so today users
have to call `ST_Box2D(geom)` / `ST_GeomFromBox2D(box)` explicitly. This
PR teaches Catalyst to handle the conversions as ordinary SQL casts.

Implementation:

- New `Box2DCastResolutionRule` analyzer rule rewrites
  `Cast(geom, Box2DUDT)` → `ST_Box2D(geom)` and
  `Cast(box, GeometryUDT)` → `ST_GeomFromBox2D(box)`. It runs before
  `CheckAnalysis`, so the downstream optimizer/codegen see the rewritten
  expression and never observe the rejected Cast.
- Registered via `SparkSessionExtensions.injectResolutionRule` from
  `SedonaSqlExtensions`.
- `SedonaSqlAstBuilder.visitPrimitiveDataType` now recognizes `BOX2D` as
  a type keyword across all supported Spark versions, so SQL
  `CAST(... AS box2d)` parses to `Box2DUDT`.

Tests:
- Unit test of the rule (rewrite for both directions, no-op for
  unrelated casts).
- SQL end-to-end suite under spark-3.5 covering SQL CAST in both
  directions, DataFrame `.cast(Box2DUDT)` / `.cast(GeometryUDT())`,
  round-trip Geometry → Box2D → Geometry, and NULL propagation.

Closes apache#2927.
Two follow-ups on the cast suite:

- The SQL `CAST(... AS box2d)` / `CAST(... AS geometry)` syntax requires
  Sedona's `SedonaSqlAstBuilder` to be the active parser. The test base
  randomizes `spark.sedona.enableParserExtension` across CI runs, so the
  SQL-level tests now `assume(parserExtensionEnabled)` and are skipped
  (rather than failed) when the stock Spark parser is in effect. The
  DataFrame `.cast(...)` tests run unconditionally because the
  resolution rule is always injected.
- The polygon vertex order produced by `ST_GeomFromBox2D` walks the
  envelope as (xmin,ymin) → (xmin,ymax) → (xmax,ymax) → (xmax,ymin); the
  expected WKTs are corrected to match.
- The suite is now replicated across spark-3.4 / 3.5 / 4.0 / 4.1 so each
  supported Spark version exercises the SQL + DataFrame surfaces of the
  geometry↔Box2D cast.
@jiayuasu jiayuasu marked this pull request as draft May 13, 2026 22:16
The previous gate read `spark.sedona.enableParserExtension` from
`sparkSession.conf`, but `SparkContext` is JVM-singleton, so the
session-level config in this suite can differ from the value
`SedonaSqlExtensions` actually saw when it decided whether to inject
`SedonaSqlAstBuilder`. CI repeatedly hit a mismatch where the assume
returned true while the active parser was Spark's stock one, causing
the SQL CAST tests to fail with ParseException.

Switch to a behavioral probe: try to parse a tiny `CAST(... AS box2d)`
SELECT once and cache the outcome. This matches what the SQL tests
actually depend on regardless of which config layer holds the truth.
DataFrame `.cast(...)` tests remain unconditional.

Verified locally for spark-3.4 / 3.5 across both parser states
(succeeded 8, canceled 0) and (succeeded 4, canceled 4).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Teaches Catalyst to handle CAST between GeometryUDT and Box2DUDT (in both directions) so users can write CAST(geom AS box2d) / CAST(box AS geometry) (SQL) or col.cast(Box2DUDT) / col.cast(GeometryUDT()) (DataFrame) instead of explicit ST_Box2D / ST_GeomFromBox2D calls.

Changes:

  • New analyzer rule Box2DCastResolutionRule rewrites the two Cast shapes into the existing Sedona expressions, registered via SedonaSqlExtensions.injectResolutionRule.
  • SedonaSqlAstBuilder.visitPrimitiveDataType now also accepts BOX2D as a primitive type keyword, across spark-3.4/3.5/4.0/4.1.
  • Adds a unit test for the rule and an end-to-end Box2DCastSuite (replicated across all four Spark version modules) covering SQL cast, DataFrame .cast, round-trip, and NULL propagation.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
spark/common/.../optimization/Box2DCastResolutionRule.scala New analyzer rule rewriting Cast↔UDT pairs into ST_Box2D / ST_GeomFromBox2D.
spark/common/.../SedonaSqlExtensions.scala Injects the new resolution rule.
spark/common/.../expressions/Constructors.scala Updates ST_GeomFromBox2D doc to mention the new cast form.
spark/common/.../Box2DCastResolutionRuleSuite.scala Unit tests for the rewrite (both directions + no-op).
spark/spark-{3.4,3.5,4.0,4.1}/.../parser/SedonaSqlAstBuilder.scala Recognize BOX2D keyword in CAST type position.
spark/spark-{3.4,3.5,4.0,4.1}/.../Box2DCastSuite.scala End-to-end cast tests (DataFrame + SQL, NULL handling, round-trip).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

@jiayuasu jiayuasu marked this pull request as ready for review May 15, 2026 06:32
@jiayuasu jiayuasu added this to the sedona-1.9.1 milestone May 15, 2026
@jiayuasu jiayuasu merged commit b8eeb95 into apache:master May 15, 2026
42 checks passed
@jiayuasu jiayuasu deleted the feature/geometry-box2d-cast branch May 15, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implicit Catalyst cast: geometry -> Box2D

2 participants