Skip to content

Enhance Avro handling in Result package with decimal support and union extraction#98

Open
amol-patel wants to merge 2 commits intomainfrom
fix/avro-decimal-type-unmarshal
Open

Enhance Avro handling in Result package with decimal support and union extraction#98
amol-patel wants to merge 2 commits intomainfrom
fix/avro-decimal-type-unmarshal

Conversation

@amol-patel
Copy link
Collaborator

  • Introduced extractUnionValue and decodeDecimal functions for better handling of Avro union types and decimal values.
  • Updated FromAvro method to utilize new functions for processing records.
  • Enhanced Column struct to include Scale and AvroTypeName for improved metadata extraction during JSON unmarshalling.
  • Refined Type unmarshalling to support complex types and improve error handling.

…n extraction

- Introduced `extractUnionValue` and `decodeDecimal` functions for better handling of Avro union types and decimal values.
- Updated `FromAvro` method to utilize new functions for processing records.
- Enhanced `Column` struct to include `Scale` and `AvroTypeName` for improved metadata extraction during JSON unmarshalling.
- Refined `Type` unmarshalling to support complex types and improve error handling.
Copilot AI review requested due to automatic review settings March 4, 2026 12:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Avro result decoding in the result package by unwrapping goavro union values more robustly and adding support for Avro logical decimal types, while enhancing schema/type unmarshalling to better handle unions and complex type objects.

Changes:

  • Updated FromAvro to unwrap union-encoded values via extractUnionValue and to decode logical decimals via decodeDecimal.
  • Added JSON unmarshalling logic to column.Column to capture Avro metadata needed for decoding (e.g., decimal scale, type name).
  • Refined column.Type JSON unmarshalling to handle union arrays and complex type objects more safely.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pkg/result/result.go Adds union unwrapping + decimal decoding during Avro row extraction.
pkg/result/column/type.go Improves Type unmarshalling for unions and complex types.
pkg/result/column/column.go Adds custom JSON unmarshalling to extract decimal/type metadata from Avro schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@amol-patel
Copy link
Collaborator Author

amol-patel commented Mar 4, 2026

Test Plan

Bug

Querying tables with DECIMAL columns via Heimdall's SparkSQL/SparkEKS plugins fails with:

json: cannot unmarshal object into Go struct field Column.fields.type of type string

When results are returned, decimal columns appear as base64-encoded byte arrays instead of numeric values.

Root Cause

  1. Column.Type.UnmarshalJSON only handled simple strings ("string") and string arrays (["null", "string"]). Avro represents DECIMAL(38,2) as complex JSON objects ({"type": "fixed", "logicalType": "decimal", ...}) or nullable variants (["null", {"type": "fixed", ...}]), which caused the unmarshal crash.
  2. The Avro data extraction logic only unwrapped union maps for primitive types. For fixed decimal types, the union map key is the full Avro name (e.g. topLevelRecord.sp_spend.fixed), not just "fixed", so values were returned as raw maps.
  3. No decimal byte decoding existed — raw Avro fixed bytes were serialized as base64 in the JSON response.

Changes

File Change
pkg/result/column/type.go Handle JSON object types and arrays containing objects in UnmarshalJSON
pkg/result/column/column.go Add Scale, AvroTypeName fields; custom UnmarshalJSON to extract decimal metadata from Avro schema
pkg/result/result.go Add extractUnionValue (handles both primitive and named complex type union maps) and decodeDecimal (converts Avro fixed/bytes to float64 using two's complement + scale)

Automated Tests

  • TestUnmarshalJSON — 12 cases covering simple strings, nullable arrays, complex objects (decimal, record, array, map), nullable complex types, and error cases
  • TestIsPrimitive — Validates primitive vs non-primitive type classification
  • TestColumnUnmarshalJSON — End-to-end schema parsing with mixed types (long, nullable string, decimal, nullable array); validates Scale and IsDecimal()
  • TestSensorAggBrandAdExpansionTacosSchema — Parses the exact Avro schema for stage.sensor_agg_brand_ad_expansion_tacos (16 fields, 8 decimal columns); validates Type, Scale, AvroTypeName for each field
  • TestDecodeDecimal — Validates byte-to-float64 conversion for zero, positive (123.45), and negative (-123.45) decimal values
  • TestDecodeDecimalNil — Nil input returns nil
  • TestDecodeDecimalNonBytes — Non-byte input passes through unchanged
  • TestExtractUnionValue — Validates union map unwrapping for primitive types, named fixed types (AvroTypeName lookup), and fallback behavior
go test ./pkg/result/... -v    # 20/20 passing

Manual Tests

  • Built and deployed Heimdall locally via docker compose up --build -d (all Go tests pass during Docker build)
  • Submitted SparkEKS job against stage.sensor_agg_brand_ad_expansion_tacos with return_result: true
  • Verified job completes successfully (previously crashed with unmarshal error)
  • Verified result columns show "type": "fixed" for decimal fields (previously crashed)
  • Verified decimal data values are returned as numeric floats (previously returned as base64-encoded byte maps)

No changes to Snowflake, Trino, ClickHouse, or other plugin result handling

@amol-patel
Copy link
Collaborator Author

image

result.json

Copy link
Contributor

@hladush hladush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for your changes
Why do you need these fields if you fully ignore them?

@amol-patel
Copy link
Collaborator Author

  1. Column.Type.UnmarshalJSON only handled simple strings ("string") and string arrays (["null", "string"]). Avro represents DECIMAL(38,2) as complex JSON objects ({"type": "fixed", "logicalType": "decimal", ...}) or nullable variants (["null", {"type": "fixed", ...}]), which caused the unmarshal crash.

Column.Type.UnmarshalJSON only handled simple strings ("string") and string arrays (["null", "string"]). Avro represents DECIMAL(38,2) as complex JSON objects ({"type": "fixed", "logicalType": "decimal", ...}) or nullable variants (["null", {"type": "fixed", ...}]), which caused the unmarshal crash.

Manual Tests

  • Built and deployed Heimdall locally via docker compose up --build -d (all Go tests pass during Docker build)
  • Submitted SparkEKS job against stage.sensor_agg_brand_ad_expansion_tacos with return_result: true
  • Verified job completes successfully (previously crashed with unmarshal error)
  • Verified result columns show "type": "fixed" for decimal fields (previously crashed)
  • Verified decimal data values are returned as numeric floats (previously returned as base64-encoded byte maps)

Tests

type_test.txt (14 tests) — Type unmarshaling for all Avro type formats, Column decimal metadata extraction, full schema test matching the production table
result_test.txt (7 tests) — Decimal byte decoding (positive, negative, zero, nil, scale 0), union value extraction (primitive, named fixed, fallback)

result_test.txt
type_test.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants