|
| 1 | +# Multi-Table CLI Support Design |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Add support for multiple `--table` flags so that queries can reference different data source files for different table names. This enables real multi-table joins: |
| 6 | + |
| 7 | +```bash |
| 8 | +logq query --table a:jsonl=data1.jsonl --table b:jsonl=data2.jsonl "SELECT * FROM a, b WHERE a.id = b.id" |
| 9 | +``` |
| 10 | + |
| 11 | +## Approach |
| 12 | + |
| 13 | +Use the **repeated `--table` flag** pattern (like Docker's `-v`, `-e`). Each flag specifies one table-name-to-file mapping. A `DataSourceRegistry` (`HashMap<String, DataSource>`) replaces the single `DataSource` parameter throughout the pipeline. |
| 14 | + |
| 15 | +### Decision Log |
| 16 | + |
| 17 | +| Decision | Choice | Alternatives Considered | |
| 18 | +|----------|--------|------------------------| |
| 19 | +| Multi-table CLI syntax | Repeated `--table` flag | Comma-separated, semicolon-separated | |
| 20 | +| Data source plumbing | `HashMap<String, DataSource>` registry | Single DataSource + optional override map | |
| 21 | +| Unknown table in query | Hard error at planning time | Silent empty results | |
| 22 | +| Duplicate table names | Error | Last-one-wins | |
| 23 | + |
| 24 | +## Design |
| 25 | + |
| 26 | +### 1. CLI Parsing (`main.rs`) |
| 27 | + |
| 28 | +- `cli.yml`: Add `multiple: true` to the `table` arg. |
| 29 | +- `main.rs`: Iterate `values_of("table")` instead of `value_of("table")`. Parse each value with `TABLE_SPEC_REGEX`, collecting into a `HashMap<String, DataSource>`. Error on: |
| 30 | + - Invalid format (not matching regex) |
| 31 | + - Unsupported file format (not in `elb/alb/squid/s3/jsonl`) |
| 32 | + - Duplicate table names across flags |
| 33 | +- Pass the `HashMap<String, DataSource>` (the `DataSourceRegistry`) to `app::run` / `app::explain`. |
| 34 | + |
| 35 | +### 2. DataSourceRegistry Type (`common/types.rs`) |
| 36 | + |
| 37 | +```rust |
| 38 | +pub type DataSourceRegistry = HashMap<String, DataSource>; |
| 39 | +``` |
| 40 | + |
| 41 | +### 3. App Layer (`app.rs`) |
| 42 | + |
| 43 | +- `app::run(query_str, data_sources: DataSourceRegistry, output_mode)` replaces the single `DataSource` param. |
| 44 | +- `app::explain(query_str, data_sources: DataSourceRegistry)` same. |
| 45 | +- `app::run_to_vec` (test helper) same. |
| 46 | +- `app::run_to_records` (bench helper) same. |
| 47 | + |
| 48 | +### 4. ParsingContext (`common/types.rs`) |
| 49 | + |
| 50 | +```rust |
| 51 | +pub(crate) struct ParsingContext { |
| 52 | + pub(crate) data_sources: DataSourceRegistry, |
| 53 | + pub(crate) registry: Arc<FunctionRegistry>, |
| 54 | +} |
| 55 | +``` |
| 56 | + |
| 57 | +Remove the `table_name` and single `data_source` fields. |
| 58 | + |
| 59 | +### 5. Logical Planner (`logical/parser.rs`) |
| 60 | + |
| 61 | +- `parse_query_top` and `parse_query` receive `DataSourceRegistry` instead of `DataSource`. |
| 62 | +- `build_from_node` looks up each table reference's base name in the registry to get its `DataSource`. Returns `ParseError::UnknownTable(name)` if not found. |
| 63 | +- `check_env` validates that every table name in the FROM clause exists in the `DataSourceRegistry`. |
| 64 | +- Subqueries and set operations pass the full registry down. |
| 65 | + |
| 66 | +Add new error variant: |
| 67 | +```rust |
| 68 | +#[error("Unknown table '{0}'. Available tables: {1}")] |
| 69 | +UnknownTable(String, String), |
| 70 | +``` |
| 71 | + |
| 72 | +### 6. Physical Plan Creator (`logical/types.rs`) |
| 73 | + |
| 74 | +`PhysicalPlanCreator` holds `DataSourceRegistry` instead of a single `DataSource`. |
| 75 | + |
| 76 | +### 7. Backward Compatibility |
| 77 | + |
| 78 | +Single-table usage is unchanged — a single `--table it:jsonl=data.jsonl` produces a one-entry registry `{"it" => DataSource::File(...)}`. All existing tests continue to work by constructing one-entry registries. |
| 79 | + |
| 80 | +## Testing |
| 81 | + |
| 82 | +### CLI Argument Parsing Tests |
| 83 | + |
| 84 | +1. Single table: `--table it:jsonl=data.jsonl` produces one-entry registry |
| 85 | +2. Multiple tables: `--table a:jsonl=d1.jsonl --table b:jsonl=d2.jsonl` produces two-entry registry |
| 86 | +3. Invalid format: `--table a:badformat=d.jsonl` errors |
| 87 | +4. Duplicate table names: `--table a:jsonl=d1.jsonl --table a:jsonl=d2.jsonl` errors |
| 88 | +5. Missing table flag: errors with `InvalidTableSpecString` |
| 89 | +6. Unknown table in query: `--table a:jsonl=d.jsonl` with `SELECT * FROM b` errors with `UnknownTable` |
| 90 | + |
| 91 | +### Integration Tests |
| 92 | + |
| 93 | +7. Cross join two different files: `SELECT a.x, b.y FROM a, b` |
| 94 | +8. Left join two different files: `SELECT a.x, b.y FROM a LEFT JOIN b ON a.id = b.id` |
| 95 | +9. All existing ~58 tests pass (one-entry registry) |
| 96 | + |
| 97 | +## Files Changed |
| 98 | + |
| 99 | +| File | Change | |
| 100 | +|------|--------| |
| 101 | +| `src/cli.yml` | Add `multiple: true` to `table` arg | |
| 102 | +| `src/main.rs` | Multi-value table parsing, duplicate detection, build registry | |
| 103 | +| `src/common/types.rs` | Add `DataSourceRegistry` type, update `ParsingContext` | |
| 104 | +| `src/app.rs` | Change signatures to take `DataSourceRegistry` | |
| 105 | +| `src/logical/parser.rs` | Registry-based table lookup in `build_from_node`, new `UnknownTable` error | |
| 106 | +| `src/logical/types.rs` | Update `PhysicalPlanCreator` to hold registry | |
0 commit comments