Skip to content

[Variant] Support bulk-appends in cast_to_variant #8323

@scovich

Description

@scovich

Re

My biggest comment / suggestion is to consider making the API vectorized (convert the entire Arrow Array) but I think we can do that as a follow on PR

And #8299 (comment) -- that run-end encoding could be handled more easily in a vectorized API.

And #8299 (comment) that suggests an append_all_rows() method.

And #8299 (comment) that also wonders about vectorization.

I'll try to give one response that covers them all:

I think it's reasonable to consider adding a bulk append type API, but we have to be cognizant of the limitations and challenges it will face:

  • We will need a new trait that knows how to create (and finish!) variant builder instances
  • Variant building is inherently row-based, so any builder that ultimately needs to produce a variant array or variant object as its output will have a trivial append_all_rows that just calls append_row in a loop (like today), in order to recursively build up the fields/elements of the variant it creates.
  • The API would be very nice for converting primitive arrays to variant, because they don't need to recurse on anything. Also nice because we could potentially define a specialized impl just for VariantArrayBuilder, so we don't have to deal with that new variant builder create+finish trait.
  • Casting a list of primitive values is an interesting intermediate case, where one should be able to append all the elements of a given list in one shot. But that might require the new create+finish trait? Or maybe it just needs a second specialization for ListBuilder?
  • Maybe instead of a no-arg append_all_rows(), we should consider a ranged append_many_rows(start..end)? One could always pass .. to request encoding of all rows.

Originally posted by @scovich in #8299 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions