Problem: ParquetSet is not discoverable nor interoperable
The object returned by sinan.download() is a ParquetSet, but its current API makes it unnecessarily hard to use and violates common Python and data ecosystem conventions.
Current issues
The ParquetSet object:
- Prints a filesystem path via
__str__(), which misleads users into assuming it is a path-like object
- Is not iterable, breaking standard Python expectations for a “set”-like container
- Does not expose any explicit path attributes (
.path, .paths, .files)
- Is not compatible with pandas or polars readers
- Does not document the correct way to load the underlying parquet data
As a result, users are forced to reverse-engineer the object behavior, effectively turning them into testers.
Violated principles
- Principle of Least Surprise
- Self-describing API
- Interoperability with the Python data ecosystem
Proposed solution
Implement the filesystem protocol by adding __fspath__ to ParquetSet:
class ParquetSet:
def __fspath__(self):
return str(self)
This small change would immediately enable native compatibility with:
pd.read_parquet(files)
pl.read_parquet(files)
pl.scan_parquet(files)
No breaking changes, no new abstractions, and no additional documentation burden.
Benefits
- Restores expected Python behavior
- Enables seamless integration with pandas and polars
- Reduces API surface and user confusion
- Eliminates the need for helper methods such as
to_dataframe()
- Improves usability without altering internal design
This change optimizes developer experience while preserving the original intent of ParquetSet.
Problem:
ParquetSetis not discoverable nor interoperableThe object returned by
sinan.download()is aParquetSet, but its current API makes it unnecessarily hard to use and violates common Python and data ecosystem conventions.Current issues
The
ParquetSetobject:__str__(), which misleads users into assuming it is a path-like object.path,.paths,.files)As a result, users are forced to reverse-engineer the object behavior, effectively turning them into testers.
Violated principles
Proposed solution
Implement the filesystem protocol by adding
__fspath__toParquetSet:This small change would immediately enable native compatibility with:
No breaking changes, no new abstractions, and no additional documentation burden.
Benefits
to_dataframe()This change optimizes developer experience while preserving the original intent of
ParquetSet.