Add support for external table creation#1301
Add support for external table creation#1301aholyoke wants to merge 1 commit intogoogleapis:mainfrom
Conversation
Summary of ChangesHello @aholyoke, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces valuable support for creating external tables, including those with Hive partitioning. The implementation is solid, leveraging SQLAlchemy's DDL compilation hooks effectively. The changes in sqlalchemy_bigquery/base.py to handle external_data_configuration are well-designed, particularly the dynamic handling of format-specific options. The new documentation in README.rst is clear, and the tests in tests/unit/test_table_options.py are comprehensive. I have one suggestion to make the type validation for source_uris more robust.
| if not isinstance(external_config.source_uris, (list, str)): | ||
| raise TypeError( | ||
| "External table source_uris must be a list of strings" | ||
| " (or a single string for Bigtable)" | ||
| ) |
There was a problem hiding this comment.
The current type check for source_uris is not strict enough. It allows a list containing non-string elements (e.g., [1, 2, 3]), which would lead to invalid SQL being generated as the uris option in BigQuery expects an array of strings. It would be more robust to also validate the types of the elements when a list is provided.
if isinstance(external_config.source_uris, list):
if not all(isinstance(uri, str) for uri in external_config.source_uris):
raise TypeError(
"When providing a list for source_uris, all elements must be strings."
)
elif not isinstance(external_config.source_uris, str):
raise TypeError(
"External table source_uris must be a list of strings"
" (or a single string for Bigtable)"
)Add bigquery_external_data_configuration dialect option for creating external tables backed by files in Google Cloud Storage. Supports: - ExternalConfig with source format and URIs - HivePartitioningOptions for hive-partitioned external tables - Format-specific options (CSV, Parquet, Avro) Includes README documentation with usage examples.
2ce3800 to
e88f0b8
Compare
|
@chalmerlowe @Linchin For review when you have a chance, thanks! |
Implements the ability to create externally backed tables + hive partitioning support. #372 🦕
Borrows the convention used for time and range partitioning where we create an instance of a struct from the core google cloud sdk, and pass it as a sqlalchemy bigquery dialect option.
Example usage:
Will produce this query: