Skip to content

API: Harden variant binary parsing against malformed input#16568

Open
nssalian wants to merge 1 commit into
apache:mainfrom
nssalian:fix-variant-malformed-parsing
Open

API: Harden variant binary parsing against malformed input#16568
nssalian wants to merge 1 commit into
apache:mainfrom
nssalian:fix-variant-malformed-parsing

Conversation

@nssalian
Copy link
Copy Markdown
Contributor

@nssalian nssalian commented May 26, 2026

Closes #16334
Addresses #16455

Summary

Validates malformed Variant binary input so adversarial buffers fail fast with IllegalArgumentException instead of OOM, StackOverflowError, or raw JVM exceptions. Adds bounds checks across header, count, offset table,
primitive payload, and short-string length, plus a 500-level nesting depth cap threaded through object/array recursive descent.

The approach mirrors Steve Loughran's parquet-java hardening work in apache/parquet-java#3562. Pulled in the changes from @steveloughran's #16335

Changes

  • SerializedMetadata, SerializedArray, SerializedObject: header / count / offset-table bounds, lazy per-element offset and field-id checks, long arithmetic to prevent int overflow.
  • SerializedPrimitive: payload bounds, BIN/STR size-field bounds, non-negative size.
  • SerializedShortString: length bounds.
  • VariantUtil: MAX_VARIANT_DEPTH = 500 and fromBuffer dispatch threading depth through recursive descent.
  • TestMalformedVariant: 20 attack-payload tests covering each new check.

Test plan

  • ./gradlew :iceberg-api:test --tests 'org.apache.iceberg.variants.*'

Co-authored-by: Steve Loughran <stevel@cloudera.com>
@github-actions github-actions Bot added the API label May 26, 2026
@nssalian nssalian marked this pull request as ready for review May 26, 2026 16:57
return from(metadata, value, header, 0);
}

static SerializedArray from(VariantMetadata metadata, ByteBuffer value, int header, int depth) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be private?

Preconditions.checkArgument(
offsetTableEnd <= value.remaining(),
"Invalid variant array: element count %s exceeds buffer",
numElements);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realised that this and the parquet-java hardening don't worry about leftover data. "don't do that" is implicit the policy there, being as it is useless.

I wonder what the rust parquet reader does.


private VariantUtil() {}

static VariantValue fromBuffer(VariantMetadata metadata, ByteBuffer value, int depth) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a javadoc, mention that the function includes validation, including depth limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden Variant Reading

2 participants