Skip to content

proxy automatically refreshes schemas#159

Open
brandtnewton wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
brandtnewton:proxy/auto-refresh
Open

proxy automatically refreshes schemas#159
brandtnewton wants to merge 2 commits into
GoogleCloudPlatform:mainfrom
brandtnewton:proxy/auto-refresh

Conversation

@brandtnewton
Copy link
Copy Markdown
Collaborator

@brandtnewton brandtnewton commented Feb 12, 2026

Adds an optional (but on by default) automatic refresh of all schemas that the Cassandra Proxy owns. The default interval is 30s. Before this feature, an instance needed to be restarted to pick up new changes that were made by another instance.

The most correct way to solve this problem would be to have instances of the proxy communicate change evens with each other but we don't have any such system in place so a timing based refresh should suffice.

Also fixes a minor bug where tables with collection type columns were always seen as having changed thus sending unnecessary changed events. This fix was made to metadata/table_schema.go.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @brandtnewton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Cassandra-Bigtable proxy by adding an automatic schema refresh mechanism, improving its ability to adapt to schema changes in Bigtable. Concurrently, it rectifies a bug in schema comparison logic, ensuring that tables containing collection types are accurately evaluated, thereby preventing spurious schema change notifications and promoting more reliable schema synchronization.

Highlights

  • Automatic Schema Refresh: Introduced a new configuration option, metadataRefreshInterval, allowing the proxy to automatically refresh schemas from Bigtable at a specified interval. This ensures the proxy's understanding of the database schema remains up-to-date without manual intervention.
  • Accurate CQL Type Comparison: Implemented a robust CqlTypesEqual function to correctly compare CQL data types, including complex collection types like maps, lists, sets, and frozen types. This resolves an issue where tables with collection type columns were incorrectly identified as having changed, leading to unnecessary schema change events.
Changelog
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/config.yaml
    • Added metadataRefreshInterval configuration option with a default value of 60 seconds.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/example_config.yaml
    • Included metadataRefreshInterval in the example configuration.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/global/types/config_types.go
    • Added MetadataRefreshInterval field to CliArgs struct.
    • Added MetadataRefreshInterval field to BigtableConfig struct.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/global/types/cql_types.go
    • Implemented CqlTypesEqual function to perform deep comparison of CqlDataType instances, handling scalar, map, list, set, and frozen types, and considering TEXT and VARCHAR as equivalent.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/global/types/cql_types_test.go
    • Added a new test file with comprehensive unit tests for the CqlTypesEqual function, covering various type comparisons including nested and nil types.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/metadata/metadata_store.go
    • Imported the time package.
    • Integrated a refreshLoop goroutine into the Initialize method to periodically reload keyspace schemas if MetadataRefreshInterval is greater than 0.
    • Modified ReloadKeyspaceSchemas to log the number of schema changes detected after syncing.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/metadata/table_schema.go
    • Updated SameSchema method to use the new types.CqlTypesEqual function for comparing CQLType in primary keys.
    • Updated SameSchema method to use types.CqlTypesEqual for comparing CQLType in regular columns.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/metadata/table_schema_test.go
    • Added a test case to Test_SameTableSchema to verify correct schema comparison for tables containing collection types.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/metadata/table_schemas.go
    • Changed the signature of SyncKeyspace to return an integer representing the count of changes.
    • Modified SyncKeyspace to return the number of events (changes) generated.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/third_party/datastax/proxy/config/defaults.go
    • Added DefaultMetadataRefreshInterval constant with a value of 30 seconds.
    • Applied DefaultMetadataRefreshInterval if MetadataRefreshInterval is not explicitly set in the configuration.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/third_party/datastax/proxy/config/setup.go
    • Added MetadataRefreshInterval field to rawCliArgs struct with a default value of 30.
    • Mapped parsed.MetadataRefreshInterval to result.MetadataRefreshInterval in ParseCliArgs.
    • Mapped args.MetadataRefreshInterval to bigtableConfig.MetadataRefreshInterval in maybeParseQuickStartArgs.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/third_party/datastax/proxy/config/types.go
    • Added MetadataRefreshInterval field to yamlBigtable struct with a default value of 30.
  • cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/third_party/datastax/proxy/config/utils.go
    • Mapped l.Bigtable.MetadataRefreshInterval to bigtableConfig.MetadataRefreshInterval in loadListenerConfig.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for automatically refreshing schemas and also fixes a bug related to schema comparison for collection types. The implementation of the schema refresh is robust, using a standard ticker and context cancellation pattern. The new CqlTypesEqual function for deep type comparison is a great addition and is well-tested.

I've identified a couple of issues related to the new configuration handling. One is a bug that prevents disabling the auto-refresh feature via YAML, and the other is a minor typo in a struct tag. I've provided specific comments with suggestions for fixes. Once these are addressed, the pull request will be in excellent shape.

@brandtnewton
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automatic, time-based schema refresh mechanism to keep the proxy's schema cache synchronized. It also includes a fix for an issue where tables with collection-type columns were incorrectly detected as changed.

The implementation of the schema refresh is well-done, using a goroutine with a ticker and context for cancellation. The fix for collection type comparison is correct and is supported by a comprehensive set of new unit tests. The new configuration options for the refresh interval are correctly integrated.

I have one suggestion to improve the robustness of the refresh loop by parallelizing the schema reloads for different keyspaces. Overall, this is a solid contribution that improves the proxy's functionality and correctness.

@brandtnewton
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automatic, time-based schema refresh mechanism to keep the proxy's metadata synchronized with Bigtable. It also includes a fix to correctly compare CQL collection types, preventing unnecessary schema change events from being triggered. The implementation of the schema refresh is solid, using a configurable interval and handling concurrent refreshes for multiple keyspaces. The fix for type comparison is also correct and includes good test coverage. I have one suggestion regarding the configuration handling for the new refresh interval to improve clarity and prevent unexpected behavior for users.

@brandtnewton
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for automatically refreshing schemas on a timer, which is a practical solution for keeping proxy instances in sync. The implementation with a background goroutine and ticker is well-done. Additionally, the fix for comparing CQL collection types is crucial for schema stability and correctly implemented using a recursive equality function. The inclusion of comprehensive integration tests for the new refresh feature in both Go and Java is excellent and greatly increases confidence in the change.

Comment thread cassandra-bigtable-migration-tools/cassandra-bigtable-proxy/config.yaml Outdated
@brandtnewton
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automatic schema refresh mechanism for the proxy, configured via metadataRefreshInterval. This is a valuable feature for multi-proxy deployments. The implementation includes a background goroutine with a ticker for periodic refreshes, which is a solid approach. The PR also commendably fixes a bug related to comparing collection-type columns by introducing a recursive CqlTypesEqual function, improving schema change detection. The changes are well-supported by new configuration options, comprehensive unit tests, and new integration tests for both Go and Java clients. I have a couple of minor suggestions for improvement in the new Java test and a documentation file.

@brandtnewton
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a periodic metadata refresh feature for the Cassandra-Bigtable proxy, enabling automatic schema updates from Bigtable. Key changes include a configurable refresh interval, a background refresh loop, and a utility for comparing CQL data types. The PR also adds comprehensive compliance and Java client tests. Review feedback identifies a potential panic when the refresh interval is zero, suggests suppressing error logs during graceful shutdown, and recommends adjusting log levels for sync events to reduce log volume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant