Skip to content

Add StorageFactory::with_metadata #2240

@CTTY

Description

@CTTY

Is your feature request related to a problem or challenge?

For users that want to work with multiple storages, they can use the resolving storage that we are about to add #2231 . But the path will still need to be resolved within the Storage, and can be heavy for certain users.

We want to add an API so you can resolve the initialization of Storage so users don't have to resolve paths during the actual Storage operations. This can benefit RestCatalog especially where file_io are loaded for different tables.

Describe the solution you'd like

This can be achieved by adding an new API with_metadata(metadata: TableMetadata) to StorageFactory trait. Let's assume user wants to rely on the metadata location to resolve the Storage, and they can do that by implementing a custom storage factory:

struct ResolvingStorageFactory { 
  scheme_to_storage: Map<String, Arc<dyn Storage>>,
  metadata: TableMetadata,
}

impl StorageFactory for CustomStorageFactory {
  fn with_metadata(metadata) { self.metadata = metadata}
  fn build(config) {
    scheme = resolve(metadata.location)
    return scheme_to_storage.get(scheme)
  }
}

in RestCatalog::load_file_io, we can have

let factory = self
            .storage_factory
            .with_metadata(metadata) // always attach metadata for RestCatalog's storage initialization
            .clone()
            .ok_or_else(|| {
                Error::new(
                    ErrorKind::Unexpected,
                    "StorageFactory must be provided for RestCatalog. Use `with_storage_factory` to configure it.",
                )
            })?;

Willingness to contribute

I can contribute to this feature independently

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions