Currently, engines can be registered with freeform names and versions, but DJ has no way to know what functions are available on each engine. This leads to potential issues:
- Decomposition may generate unsupported SQL - e.g., using
hll_sketch_agg on Spark 3.3 which doesn't have it
- No graceful degradation - DJ can't fall back to simpler approaches when advanced features aren't available
- Function translation is fragile
Examples
| Function |
Spark 3.3 |
Spark 4.0+ |
Druid |
hll_sketch_agg |
✅ |
✅ |
✅ (DS_HLL) |
theta_sketch_agg |
❌ |
✅ |
✅ (THETA_SKETCH) |
Proposed Solution
Replace freeform engine registration with a curated list of supported engine/dialect combinations:
SUPPORTED_ENGINES = {
"spark:3.5": SparkDialect35(),
"spark:4.0": SparkDialect40(),
"trino:4xx": TrinoDialect(),
"druid:31": DruidDialect(),
}
Each dialect would declare:
- Available functions
- Function name mappings for translation
- Valid decomposition strategies
- Type coercions