Skip to content

Add GCS (Google Cloud Storage) support to Python bindings #2236

@elliotsteene-swap

Description

@elliotsteene-swap

Description

The Python bindings (pyiceberg-core) do not support GCS-backed Iceberg tables when used with the DataFusion table provider. The underlying iceberg-storage-opendal crate already has full GCS support via the opendal-gcs feature flag, and the OpenDalStorageFactory::Gcs variant exists in crates/storage/opendal/src/lib.rs — but it is not wired up in the Python bindings.

This means anyone using pyiceberg + DataFusion with tables stored on gs:// hits a runtime error:

RuntimeError: Unsupported storage scheme: gs

Steps to Reproduce

from pyiceberg.catalog import load_catalog
from datafusion import SessionContext

catalog = load_catalog("my_catalog")  # REST catalog pointing to GCS-backed warehouse
table = catalog.load_table("my_namespace.my_table")

ctx = SessionContext()
ctx.register_table("my_table", table)  # <-- fails here

Error:

RuntimeError: Unsupported storage scheme: gs

The call path is:

  1. ctx.register_table() calls table.__datafusion_table_provider__()
  2. pyiceberg constructs IcebergDataFusionTable and calls its __datafusion_table_provider__()
  3. Rust-side storage_factory_from_path() in bindings/python/src/datafusion_table_provider.rs does not match gs or gcs schemes

Proposed Changes

1. Enable opendal-gcs feature in bindings/python/Cargo.toml

- iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory"] }
+ iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory", "opendal-gcs"] }

2. Add gs/gcs match arms in bindings/python/src/datafusion_table_provider.rs

In storage_factory_from_path():

  let factory: Arc<dyn StorageFactory> = match scheme {
      "file" | "" => Arc::new(OpenDalStorageFactory::Fs),
      "s3" | "s3a" => Arc::new(OpenDalStorageFactory::S3 {
          configured_scheme: scheme.to_string(),
          customized_credential_load: None,
      }),
      "memory" => Arc::new(OpenDalStorageFactory::Memory),
+     "gs" | "gcs" => Arc::new(OpenDalStorageFactory::Gcs),
      _ => {
          return Err(PyRuntimeError::new_err(format!(
              "Unsupported storage scheme: {scheme}"
          )));
      }
  };

Context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions