-
Notifications
You must be signed in to change notification settings - Fork 430
Open
Description
Description
The Python bindings (pyiceberg-core) do not support GCS-backed Iceberg tables when used with the DataFusion table provider. The underlying iceberg-storage-opendal crate already has full GCS support via the opendal-gcs feature flag, and the OpenDalStorageFactory::Gcs variant exists in crates/storage/opendal/src/lib.rs — but it is not wired up in the Python bindings.
This means anyone using pyiceberg + DataFusion with tables stored on gs:// hits a runtime error:
RuntimeError: Unsupported storage scheme: gs
Steps to Reproduce
from pyiceberg.catalog import load_catalog
from datafusion import SessionContext
catalog = load_catalog("my_catalog") # REST catalog pointing to GCS-backed warehouse
table = catalog.load_table("my_namespace.my_table")
ctx = SessionContext()
ctx.register_table("my_table", table) # <-- fails hereError:
RuntimeError: Unsupported storage scheme: gs
The call path is:
ctx.register_table()callstable.__datafusion_table_provider__()- pyiceberg constructs
IcebergDataFusionTableand calls its__datafusion_table_provider__() - Rust-side
storage_factory_from_path()inbindings/python/src/datafusion_table_provider.rsdoes not matchgsorgcsschemes
Proposed Changes
1. Enable opendal-gcs feature in bindings/python/Cargo.toml
- iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory"] }
+ iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory", "opendal-gcs"] }2. Add gs/gcs match arms in bindings/python/src/datafusion_table_provider.rs
In storage_factory_from_path():
let factory: Arc<dyn StorageFactory> = match scheme {
"file" | "" => Arc::new(OpenDalStorageFactory::Fs),
"s3" | "s3a" => Arc::new(OpenDalStorageFactory::S3 {
configured_scheme: scheme.to_string(),
customized_credential_load: None,
}),
"memory" => Arc::new(OpenDalStorageFactory::Memory),
+ "gs" | "gcs" => Arc::new(OpenDalStorageFactory::Gcs),
_ => {
return Err(PyRuntimeError::new_err(format!(
"Unsupported storage scheme: {scheme}"
)));
}
};Context
- The
OpenDalStorageFactory::Gcsvariant andgcs_config_parse()already exist incrates/storage/opendal/src/lib.rs - The
opendal-gcsfeature flag is defined incrates/storage/opendal/Cargo.tomland is included inopendal-all - GCS storage support was added to iceberg-rust in feat: support for gcs storage #520, with
gs:///gcs://scheme support in fix: support both gs and gcs schemes for google cloud storage #845 and OAuth support in feat: add gcp oauth support #654 - This is a common use case for anyone using Google BigLake Metastore with pyiceberg and DataFusion
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels