Skip to content

[SPARK-52428] Add DataFrame::to_local_iterator() for batch-by-batch results#32

Open
rafafrdz wants to merge 1 commit intoapache:masterfrom
rafafrdz:add-to-local-iterator
Open

[SPARK-52428] Add DataFrame::to_local_iterator() for batch-by-batch results#32
rafafrdz wants to merge 1 commit intoapache:masterfrom
rafafrdz:add-to-local-iterator

Conversation

@rafafrdz
Copy link
Copy Markdown

Summary

  • Add SparkConnectClient::to_arrow_batches() that returns individual Vec<RecordBatch> without concatenating
  • Add DataFrame::to_local_iterator() that returns batches individually, avoiding OOM on large datasets
  • Unlike collect() which concatenates all batches into one RecordBatch, this preserves batch boundaries

Test plan

  • cargo build passes
  • cargo fmt -- --check passes

@rafafrdz rafafrdz force-pushed the add-to-local-iterator branch from 1a0a75b to 85d03d7 Compare March 29, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant