Skip to content

add native union type casting support#9544

Open
friendlymatthew wants to merge 1 commit intoapache:mainfrom
pydantic:friendlymatthew/cast-union
Open

add native union type casting support#9544
friendlymatthew wants to merge 1 commit intoapache:mainfrom
pydantic:friendlymatthew/cast-union

Conversation

@friendlymatthew
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This PR adds native support for casting Union arrays in the cast kernel. Previously, can_cast_types and cast_with_options had no handling for union types at all

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 12, 2026
Copy link
Contributor Author

@friendlymatthew friendlymatthew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self review

Comment on lines +1209 to +1211
(_, Union(_, _)) => Err(ArrowError::CastError(format!(
"Casting from {from_type} to {to_type} not supported"
))),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not support casting scalar types into a Union type

It's not a feature that we currently need and will probably need more complex work to get it correct. Though, I think this is a cool/valid feature. I can file an issue if liked

Comment on lines +2316 to +2333
let type_ids = array.type_ids().clone();
let offsets = array.offsets().cloned();

let new_children: Vec<ArrayRef> = from_fields
.iter()
.map(|(from_id, _from_field)| {
let (_, to_field) = to_fields
.iter()
.find(|(to_id, _)| *to_id == from_id)
.ok_or_else(|| {
ArrowError::CastError(format!(
"Cannot cast union: type_id {from_id} not found in target union"
))
})?;
let child = array.child(from_id);
cast_with_options(child.as_ref(), to_field.data_type(), cast_options)
})
.collect::<Result<_, _>>()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look ups like this can go from n^2 to nlogn if we implement #8937, since sorting by type_id will give us logn searching

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/cast-union branch from 26859a6 to 9c19cb1 Compare March 12, 2026 17:45
@friendlymatthew
Copy link
Contributor Author

cc @Jefffrey @alamb curious to get your thoughts

This is a feature we use in production at Pydantic, and we actively maintain a vendored version that we're eager to upstream

@alamb
Copy link
Contributor

alamb commented Mar 16, 2026

Thanks -- I hope to go through arrow prs more carefully starting tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @friendlymatthew

I like where this is headed.

I left some suggestions - let me know if it makes sense

array: &UnionArray,
from_fields: &UnionFields,
to_fields: &UnionFields,
_to_mode: UnionMode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the _to_mode parameter ignored?


#[test]
fn test_cast_union_prefers_exact_type_match() {
// Union(Int32, Int64) → Int64: Int64 is an exact match, so only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please document the expected union casting behavior in the documentation?

Specifically, somewhere here:

https://docs.rs/arrow/latest/arrow/compute/fn.cast_with_options.html#durations-and-intervals

That will make it easier to understand the intended semantics for union casting as well as verify that this code implements the semantics correctly

Field::new("i", DataType::Int32, false),
Field::new("s", DataType::Utf8, false),
]),
UnionMode::Dense,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update these tests for:

  1. DOn't replicate the logic for can_cast_types -- instead add a call to can_cast_types into the same tests that calls cast. This will make it easier to review coverage, as the arrays that can be cast will also have the corresponding can_cast_types coverage
  2. Ensure the tests cover both Dense and Sparse unions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Union type casting support to arrow-cast

2 participants