add native union type casting support#9544
add native union type casting support#9544friendlymatthew wants to merge 1 commit intoapache:mainfrom
Conversation
friendlymatthew
left a comment
There was a problem hiding this comment.
self review
arrow-cast/src/cast/mod.rs
Outdated
| (_, Union(_, _)) => Err(ArrowError::CastError(format!( | ||
| "Casting from {from_type} to {to_type} not supported" | ||
| ))), |
There was a problem hiding this comment.
This PR does not support casting scalar types into a Union type
It's not a feature that we currently need and will probably need more complex work to get it correct. Though, I think this is a cool/valid feature. I can file an issue if liked
| let type_ids = array.type_ids().clone(); | ||
| let offsets = array.offsets().cloned(); | ||
|
|
||
| let new_children: Vec<ArrayRef> = from_fields | ||
| .iter() | ||
| .map(|(from_id, _from_field)| { | ||
| let (_, to_field) = to_fields | ||
| .iter() | ||
| .find(|(to_id, _)| *to_id == from_id) | ||
| .ok_or_else(|| { | ||
| ArrowError::CastError(format!( | ||
| "Cannot cast union: type_id {from_id} not found in target union" | ||
| )) | ||
| })?; | ||
| let child = array.child(from_id); | ||
| cast_with_options(child.as_ref(), to_field.data_type(), cast_options) | ||
| }) | ||
| .collect::<Result<_, _>>()?; |
There was a problem hiding this comment.
look ups like this can go from n^2 to nlogn if we implement #8937, since sorting by type_id will give us logn searching
26859a6 to
9c19cb1
Compare
9c19cb1 to
8dbe39f
Compare
|
Thanks -- I hope to go through arrow prs more carefully starting tomorrow |
alamb
left a comment
There was a problem hiding this comment.
thank you @friendlymatthew
I like where this is headed.
I left some suggestions - let me know if it makes sense
| array: &UnionArray, | ||
| from_fields: &UnionFields, | ||
| to_fields: &UnionFields, | ||
| _to_mode: UnionMode, |
There was a problem hiding this comment.
why is the _to_mode parameter ignored?
|
|
||
| #[test] | ||
| fn test_cast_union_prefers_exact_type_match() { | ||
| // Union(Int32, Int64) → Int64: Int64 is an exact match, so only |
There was a problem hiding this comment.
Can you please document the expected union casting behavior in the documentation?
Specifically, somewhere here:
https://docs.rs/arrow/latest/arrow/compute/fn.cast_with_options.html#durations-and-intervals
That will make it easier to understand the intended semantics for union casting as well as verify that this code implements the semantics correctly
| Field::new("i", DataType::Int32, false), | ||
| Field::new("s", DataType::Utf8, false), | ||
| ]), | ||
| UnionMode::Dense, |
There was a problem hiding this comment.
Can you please update these tests for:
- DOn't replicate the logic for
can_cast_types-- instead add a call tocan_cast_typesinto the same tests that callscast. This will make it easier to review coverage, as the arrays that can be cast will also have the correspondingcan_cast_typescoverage - Ensure the tests cover both Dense and Sparse unions
Which issue does this PR close?
arrow-cast#9225Rationale for this change
This PR adds native support for casting Union arrays in the cast kernel. Previously,
can_cast_typesandcast_with_optionshad no handling for union types at all