Skip to content

fix(parquet): strip repetition_type from root SchemaElement during serialization#723

Open
hcrosse wants to merge 2 commits intoapache:mainfrom
hcrosse:fix/strip-root-repetition-type
Open

fix(parquet): strip repetition_type from root SchemaElement during serialization#723
hcrosse wants to merge 2 commits intoapache:mainfrom
hcrosse:fix/strip-root-repetition-type

Conversation

@hcrosse
Copy link

@hcrosse hcrosse commented Mar 18, 2026

Rationale

The Parquet spec says the root of the schema doesn't have a repetition_type. arrow-go writes REPEATED for the root SchemaElement in the Thrift footer, which breaks interop with consumers like Snowflake. #722 has the full writeup and cross-implementation comparison.

What changes are included in this PR?

ToThrift() now nils out RepetitionType on the root element before returning, stripping it from the serialized output. This matches how parquet-java and arrow-rs handle the root.

The in-memory representation and WithRootRepetition API are unaffected. FromParquet already tolerates a nil root repetition type, so this is backwards-compatible for both readers and writers.

Are these changes tested?

Updated the existing TestNestedExample and added TestToThriftRootRepetitionStripped which checks that the root's repetition_type is stripped for all three repetition variants and that non-root elements keep theirs.

Closes #722

…rialization

The Parquet spec states the root of the schema does not have a
repetition_type. arrow-go was writing REPEATED for the root
SchemaElement in the Thrift footer, which is non-standard and breaks
interoperability with consumers like Snowflake.

Strip RepetitionType from the root element in ToThrift(), matching
parquet-java and arrow-rs behavior. The in-memory representation and
WithRootRepetition API are unaffected. FromParquet already tolerates
a nil root repetition type, so this is backwards-compatible for both
readers and writers.

Closes apache#722
@hcrosse hcrosse requested a review from zeroshade as a code owner March 18, 2026 19:08
…ping

Stripping the root SchemaElement's repetition_type removes 2 bytes
from the serialized Thrift footer (field header + enum value).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Root schema element written with REPEATED repetition breaks interoperability

1 participant