Skip to content

Support for Dask, PySpark, and Ray via Fugue#328

Open
spolisar wants to merge 17 commits intomainfrom
feat/anydataframe
Open

Support for Dask, PySpark, and Ray via Fugue#328
spolisar wants to merge 17 commits intomainfrom
feat/anydataframe

Conversation

@spolisar
Copy link
Collaborator

@spolisar spolisar commented Mar 19, 2026

Add support for working with distributed dataframes via Fugue. Supports Dask, PySpark, and Ray.

A repartitioning workaround is used for an issue with single-partition dask dataframes resulting in an output dataframe that throws a ValueError: Cannot repartition on divisions with unknown divisions when trying to compute() or repartition it.

The ci run of distributed tests is limited to a single worker to address an "out of memory" issue.

TODO:

  • add examples to mkdocs.yml

@spolisar spolisar marked this pull request as ready for review March 20, 2026 18:07
@spolisar spolisar requested a review from AzulGarza March 20, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants