Skip to content

Add coral-benchmark module for cross-dialect translation testing#599

Merged
wmoustafa merged 6 commits into
linkedin:masterfrom
wmoustafa:coral-benachmark
May 10, 2026
Merged

Add coral-benchmark module for cross-dialect translation testing#599
wmoustafa merged 6 commits into
linkedin:masterfrom
wmoustafa:coral-benachmark

Conversation

@wmoustafa
Copy link
Copy Markdown
Contributor

Add coral-benchmark module for cross-dialect translation testing

Summary

Adds a new coral-benchmark module that provides a framework for testing Coral translations end-to-end between any pair of supported dialects (Hive, Spark, Trino). This is the API design — interfaces, data types, and the orchestration layer — ready for implementation to be filled in.

Motivation

Coral's existing tests verify individual dialect converters in isolation (e.g., Hive-to-Trino, Hive-to-Spark). There is no unified way to test arbitrary dialect-to-dialect translations, validate that translated queries are syntactically valid on the target engine, or compare query results across engines for semantic equivalence. This module closes that gap.

Design

The framework is built around two SPIs and three verification levels:

SPIs:

  • DialectPlugin — wraps existing Coral converters (HiveToRelConverter, TrinoToRelConverter, RelToTrinoConverter, CoralSpark) behind a uniform toRelNode() / toDialectSql() interface.
  • EnginePlugin — provides execution capabilities (createTable, loadData, explain, execute) for embedded engines (Spark, Trino, Hive).

Verification levels (escalating):

  1. TRANSLATION — source SQL -> Coral IR -> target SQL completes without error.
  2. EXPLAIN — the translated SQL passes the target engine's EXPLAIN (syntax + planning validation).
  3. RESULT_SET — query results from source and target engines are compared for semantic equivalence, with configurable tolerances for floating-point precision, row ordering, NULL handling, and type widening.

Supporting components:

  • InMemoryCatalog — an in-memory CoralCatalog implementation using the Coral type system (StructType, PrimitiveType, etc.) so tests have no external metastore dependency.
  • RowSet / ResultSet — typed tabular data containers for loading test data and capturing query results.
  • ResultSetComparator with ComparisonConfig — configurable comparison logic for cross-engine result equivalence.
  • TranslationTestSuite — builder-configured orchestrator that reads .sql files from a query directory and runs each through the configured pipeline.
  • TestReport / QueryTestResult — structured reporting with per-query outcomes and aggregate failure categorization (translation error, explain failure, result mismatch).

What's included

  • 15 Java source files across 5 packages (spi, catalog, data, comparison, suite)
  • build.gradle with api dependency on :coral-common
  • The testing spec document (coral-benchmark-spec.md)
  • Module registered in settings.gradle
  • Compiles cleanly via ./gradlew :coral-benchmark:compileJava

What's not included (yet)

  • Concrete DialectPlugin implementations (Hive, Spark, Trino)
  • Concrete EnginePlugin implementations (embedded Spark, Trino)
  • ResultSetComparator.compare() implementation
  • TranslationTestSuite.run() implementation
  • Query corpus (.sql test files)

Copy link
Copy Markdown
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

* @return the Calcite {@link RelNode} representing the Coral intermediate representation
* @throws IllegalArgumentException if the SQL cannot be parsed in this dialect
*/
RelNode toRelNode(String sql, CoralCatalog catalog);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about passing catalog at construction or via init(CoralCatalog). The SPI methods simplify to toRelNode(String sql) / toDialectSql(RelNode relNode).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with a DialectPluginProvider / DialectPlugin split. The provider is the no-arg ServiceLoader entry point with create(CoralCatalog); the plugin holds the catalog as a final field, so toRelNode(sql) and toDialectSql(rel) no longer need it threaded through. Keeps plugins immutable while still supporting standard SPI discovery.


Values are Java objects matching the Coral type mapping (INT -> Integer, STRING -> String, ARRAY -> List, MAP -> Map, STRUCT -> Object[], etc.).

## 4. Verification Levels
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding lightweight ASCII diagrams (that Claude Code can generate) to make things clearer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Done.

* @param sql a SELECT statement in this dialect's syntax
* @param catalog the catalog providing table metadata for query resolution
* @return the Calcite {@link RelNode} representing the Coral intermediate representation
* @throws IllegalArgumentException if the SQL cannot be parsed in this dialect
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The orchestrator in TranslationTestSuite.run() will need to catch different exception types at different stages and map them to FailureCategory. What about defining a BenchmarkException hierarchy (TranslationException, EngineException) that plugins are contracted to throw.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered this. With the assumption that queries are pre-vetted before admission to the benchmark, every failure is attributable to translation gap or to engine result mismatch; there is no “is this the query’s fault?” classification for plugins to make.

That means the orchestrator can map failures to FailureCategory purely from the call site of each SPI invocation:

  • A throw out of toRelNode / toDialectSql is TRANSLATION_ERROR.
  • EXPLAIN_FAILURE and RESULT_MISMATCH already come from result types.

So we don’t think a typed exception hierarchy is adding much beyond the enums. Plugins can let underlying converter exceptions bubble naturally, and the orchestrator can map those to the above failure categories.

@@ -0,0 +1,281 @@
# Coral Benchmark: Cross-Dialect Integration Testing Framework
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add this new module (WIP) to the README.md file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Member

@1fanwang 1fanwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall direction looks good, thanks for continuing this effort!

In-memory catalog, SPI-based dialects, and escalating verification levels (especially RESULT_SET for semantic equivalence against real engines) — I think this is the shape that's been missing on the OSS side for exactly the kinds of subtle cross-engine differences that bite in practice: NULL handling, UNION coercion, timestamp precision. Skeleton-first with a spec doc is a nice way to surface the API before anyone invests in plugin plumbing.

One forward-looking thought: the long-term value of this framework probably lives in the query corpus more than the framework itself — curious if you have a mental model for how the corpus grows over time (in-repo seed vs. per-consumer contributions, which translation-divergence categories to prioritize first). Happy to pick that up in a follow-up.

* @return the Calcite {@link RelNode} representing the Coral intermediate representation
* @throws IllegalArgumentException if the SQL cannot be parsed in this dialect
*/
RelNode toRelNode(String sql, CoralCatalog catalog);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how you're thinking about config/context flowing through the SPI. Looking at the existing converters:

  • RelToTrinoConverter consumes a Map<String, Boolean> of CoralTrinoConfigKeys (e.g. SUPPORT_LEGACY_UNNEST_ARRAY_OF_STRUCT) — exactly the kind of semantic knob a correctness suite would want to toggle per test.
  • CoralSpark.create threads an HMS client through.

Plugins could own those internally, but then the benchmark caller can't exercise a given converter across its config surface. Wondering if something like DialectPlugin<Ctx> or a small per-plugin config object earns its keep once the first real impls land — or whether you'd rather keep the core SPI narrow and handle the variance via plugin-specific sub-interfaces. No strong view, just flagging before it ossifies.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. With the factory split (check the most recent revision of the PR), my current thinking is that dialect-specific config (CoralTrinoConfigKeys, HMS clients, etc.) lives in concrete provider constructors, e.g., new HiveDialectPluginProvider(hmsClient), new TrinoDialectPluginProvider(configKeys) so the SPI itself stays narrow and the config variance lives in the dialect's own provider. That seems to absorb the cases you mentioned without baking a Ctx parameter into DialectPlugin itself.

apply plugin: 'java-library'

dependencies {
api project(path: ':coral-common')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small topology thought — today the module only depends on :coral-common, which keeps it dialect-agnostic. §9 of the spec suggests it'll grow deps on coral-hive, coral-trino, coral-spark once the concrete plugins land.

Have you considered hosting each DialectPlugin in its native module (e.g. coral-hive shipping HiveDialectPlugin with a META-INF/services entry, picked up by the ServiceLoader path you've already sketched)? Keeps this module thin, and a future contributor adding a new dialect wouldn't need to touch coral-benchmark. Defer if you've already weighed it — just wanted to surface while the topology is still fluid.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Trimmed the spec sections that committed to plugin/engine hosting in coral-benchmark. 8 no longer names plugin classes inside this module, and 9 no longer lists future deps on :coral-hive / :coral-trino / :coral-spark. The hosting decision (this module vs. native modules vs. per-dialect bridge modules) is intentionally left open so it can be made against the first real plugin rather than against a sketch. Will pick it up when the first concrete plugin is on the table.

@wmoustafa
Copy link
Copy Markdown
Contributor Author

The overall direction looks good, thanks for continuing this effort!

In-memory catalog, SPI-based dialects, and escalating verification levels (especially RESULT_SET for semantic equivalence against real engines) — I think this is the shape that's been missing on the OSS side for exactly the kinds of subtle cross-engine differences that bite in practice: NULL handling, UNION coercion, timestamp precision. Skeleton-first with a spec doc is a nice way to surface the API before anyone invests in plugin plumbing.

One forward-looking thought: the long-term value of this framework probably lives in the query corpus more than the framework itself — curious if you have a mental model for how the corpus grows over time (in-repo seed vs. per-consumer contributions, which translation-divergence categories to prioritize first). Happy to pick that up in a follow-up.

Thanks. Fully agree the long-term value is in the corpus more than the framework. The framework PR is the first step towards that. Yes, I have some rough ideas. Mainly we want to maximize diversity in the categories we test (e.g., nested queries, CTEs, lateral joins, null handling, etc), achieve high coverage of operators, and address varying shapes of schemas. Happy to discuss further with you and plan the work.

Copy link
Copy Markdown
Member

@1fanwang 1fanwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall, we can get this in and keep iterating on it

wmoustafa added 5 commits May 10, 2026 14:31
Introduces the API design for a benchmark framework that tests Coral
translations end-to-end between any supported dialect pair (Hive, Spark,
Trino). Includes SPIs for dialect translation and engine execution,
in-memory catalog, typed test data, result-set comparison, and a
test suite orchestrator with three verification levels.
Remove SKIP and ERROR statuses that had no backing mechanism in the API.
FailureCategory already captures the reason for failure (translation
error, explain failure, result mismatch).
Remove unused imports and apply eclipse code style formatting.
Arrays.copyOf is used in RowSet.Builder.addRow but spotless incorrectly
flagged the import as unused due to a JDK version mismatch.
Replace bare '>' characters in Javadoc comments (from '->' arrows and
type mappings) that JDK 8 javadoc rejects as malformed HTML.
@wmoustafa wmoustafa force-pushed the coral-benachmark branch 2 times, most recently from 68e0c83 to e783d68 Compare May 10, 2026 21:39
- Refactor DialectPlugin SPI into a provider/plugin split. DialectPluginProvider
  is the no-arg ServiceLoader entry point; DialectPlugin is the catalog-bound
  translator with final fields, removing the catalog parameter from
  toRelNode/toDialectSql. The suite builder accepts providers explicitly or
  resolves them via ServiceLoader by dialect.
- Trim spec sections that prematurely committed to plugin/engine hosting in
  coral-benchmark; the hosting decision is deferred until concrete plugins
  land so it can be evaluated against real code.
- Add ASCII diagrams to the spec: a component overview in section 3 showing
  how the suite, catalog, SPI plugins, and comparator relate; and a unified
  escalation diagram in section 4 showing how the three verification levels
  stack, with a quick-reference table of per-level requirements.
- List coral-benchmark in the root README's Modules section.
@wmoustafa wmoustafa merged commit 8f285d2 into linkedin:master May 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants