Honor caller-provided rowType in HiveViewExpander.expandView#607
Honor caller-provided rowType in HiveViewExpander.expandView#607simbadzina wants to merge 1 commit into
Conversation
6d5714a to
55e0f7a
Compare
HiveViewExpander.expandView() previously discarded the rowType argument and returned whatever the re-parsed view body produced. When the caller-recorded row type and the expanded RelNode's row type differed in field order, downstream consumers that rely on positional access could land on the wrong RelNode-output position. Prefix-named sibling columns (one name a prefix of another) are particularly prone to swapping under such mismatches. This change wraps the expanded RelRoot in a LogicalProject that re-aligns fields to the caller's rowType by name (case-insensitive). The wrapper is a no-op when the orderings already match. If a caller-expected field is missing from the expanded body, or if arities differ, the method falls back to returning the original RelRoot unchanged. Adds: - Unit tests for alignToRowType covering aligned, case-only-difference, reorder, prefix-pair, missing-field, and arity-mismatch cases. - An integration test that drives the full expandView path with a deliberately reversed caller-provided rowType.
55e0f7a to
26a65b6
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes Hive view expansion so expanded view outputs can be reordered to match the caller-provided row type, preventing positional column mismatches in downstream view-on-view scenarios.
Changes:
- Adds
HiveViewExpander.alignToRowTypeto reorder expanded view fields by case-insensitive name when needed. - Adds unit coverage for no-op, reorder, prefix-sibling, missing-field, and arity fallback cases.
- Adds an integration fixture and test for the full
expandViewpath.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/HiveViewExpander.java |
Adds row-type alignment after view query conversion. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveViewExpanderTest.java |
Adds unit tests for alignment behavior. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/HiveToRelConverterTest.java |
Adds integration regression test for caller row-type ordering. |
coral-hive/src/test/java/com/linkedin/coral/hive/hive2rel/TestUtils.java |
Adds Hive test table/view fixtures used by the integration test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Wrap {@code root} in a {@link LogicalProject} that re-orders its output | ||
| * fields to match {@code expected} by name (case-insensitive). No-op when the | ||
| * orderings already agree. If a name is missing from {@code root} or arities | ||
| * differ, returns {@code root} unchanged and logs a warning. | ||
| */ |
There was a problem hiding this comment.
Is not the right behavior to throw an exception? There should not be a disagreement between the caller expectation and the view structure in the first place.
There was a problem hiding this comment.
Do you mean for when there is a mismatch like an missing field, or even when there is an ordering issue?
For the ordering, I think we can self-heal drift with the alignment code since there is a complete info.
I'm assuming a change like a new column could show up a mismatch, missing field.
Automatically taking the root would expose the column to consumers, whereas an error would break the view. I think for select * we'd want the new column to seamless appear.
There was a problem hiding this comment.
The source of the divergence I observed was in another system, outside of Coral. During view creation, the storage columns were being explicitly set in the storageDescriptor of the view in a different order than in the view text. This particular instance of divergence has been fixed in that system.
To guard against similar broken views Coral can throw an exception as you suggested, or at the least log a warning.
Summary
When a downstream view does
SELECT * FROM upstreamandupstream's HMS-recorded row type disagrees with what its re-parsed body produces in field order, the downstream's positional access can land on the wrong field. The case that surfaces this most painfully is prefix-name sibling pairs:id(INTEGER)idHash(VARCHAR)idHash(VARCHAR)id(INTEGER)A case-insensitive lookup of
idagainst the body's row type can matchidHash— the resolver gets confused when a row type contains both a short name and a longer name that has it as a prefix. Every consumer of the view then silently reads values from the wrong source column.The root cause is that
HiveViewExpander.expandView()was discarding itsrowTypeargument and returning whatever the re-parsed view body produced, violating theViewExpandercontract that the returnedRelRoot's row type must match the caller-providedrowType.Fix
After
convertQuery, wrap the returnedRelRootin aLogicalProjectthat re-aligns fields to the caller-providedrowTypeby name (case-insensitive):flowchart LR C["Caller's rowType<br/>[id INTEGER, idHash VARCHAR]"] --> E B["Re-parsed body's rowType<br/>[idHash VARCHAR, id INTEGER]"] --> E[alignToRowType] E --> Q{Arity match<br/>and field names<br/>aligned by order?} Q -->|Yes| K[Return original RelRoot] Q -->|All names present<br/>but different order| W["Wrap in LogicalProject<br/>that re-binds by name<br/>(case-insensitive)"] Q -->|Arity differs<br/>or a name is missing| L[Log warn, return original]The safe-fallback paths preserve pre-fix behavior for callers whose expanded body genuinely produces a different shape, while making such cases visible in logs.
Test plan
alignToRowTypeinHiveViewExpanderTestcovering: already-aligned (no-op fast path), case-only differences (no-op), reordered fields (wraps inProject), single prefix-sibling pair (id/idHash), multiple prefix-sibling pairs in one row type (id/idHash,email/emailVerified,country/countryCodesimultaneously — mirrors the realistic failure shape at production scale), missing field (safe fallback), and different arity (safe fallback).testExpandViewAlignsToCallerRowTypeinHiveToRelConverterTestthat drives the fullexpandViewpath with a deliberately reversed caller-provided rowType.alignToRowTypetoreturn root;causes exactly the tests targeting the fix to fail (testReorderedFieldsWrapInProject,testPrefixSiblingsReorderedCorrectly,testMultiplePrefixSiblingsAllReorderedCorrectly,testExpandViewAlignsToCallerRowType); the safe-fallback and no-op tests still pass. Restoring the fix turns the suite green.:coral-hive:testand:coral-trino:testsuites pass with no regressions.Manual verification
Validated end-to-end against a real production view-on-view chain (Avro-backed base table → intermediate Hive view → downstream view) using internal tooling that drives the full Coral translation path. Confirmed:
Also characterized the failure surface by varying the upstream view shape: the bug fires specifically when the upstream lists the longer prefix-extension name before its shorter sibling (e.g.
idHashbeforeid).SELECT *upstreams and shorter-name-first explicit projections do not trigger the bug.🤖 Generated with Claude Code