Skip to content

Commit b7db856

Browse files
committed
Add multi-table design and implementation plan documents
1 parent 04b7ecb commit b7db856

10 files changed

Lines changed: 4625 additions & 0 deletions

docs/plans/2026-04-05-multi-table-design-final.md

Lines changed: 473 additions & 0 deletions
Large diffs are not rendered by default.

docs/plans/2026-04-05-multi-table-design-v2.md

Lines changed: 452 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Multi-Table CLI Support Design
2+
3+
## Summary
4+
5+
Add support for multiple `--table` flags so that queries can reference different data source files for different table names. This enables real multi-table joins:
6+
7+
```bash
8+
logq query --table a:jsonl=data1.jsonl --table b:jsonl=data2.jsonl "SELECT * FROM a, b WHERE a.id = b.id"
9+
```
10+
11+
## Approach
12+
13+
Use the **repeated `--table` flag** pattern (like Docker's `-v`, `-e`). Each flag specifies one table-name-to-file mapping. A `DataSourceRegistry` (`HashMap<String, DataSource>`) replaces the single `DataSource` parameter throughout the pipeline.
14+
15+
### Decision Log
16+
17+
| Decision | Choice | Alternatives Considered |
18+
|----------|--------|------------------------|
19+
| Multi-table CLI syntax | Repeated `--table` flag | Comma-separated, semicolon-separated |
20+
| Data source plumbing | `HashMap<String, DataSource>` registry | Single DataSource + optional override map |
21+
| Unknown table in query | Hard error at planning time | Silent empty results |
22+
| Duplicate table names | Error | Last-one-wins |
23+
24+
## Design
25+
26+
### 1. CLI Parsing (`main.rs`)
27+
28+
- `cli.yml`: Add `multiple: true` to the `table` arg.
29+
- `main.rs`: Iterate `values_of("table")` instead of `value_of("table")`. Parse each value with `TABLE_SPEC_REGEX`, collecting into a `HashMap<String, DataSource>`. Error on:
30+
- Invalid format (not matching regex)
31+
- Unsupported file format (not in `elb/alb/squid/s3/jsonl`)
32+
- Duplicate table names across flags
33+
- Pass the `HashMap<String, DataSource>` (the `DataSourceRegistry`) to `app::run` / `app::explain`.
34+
35+
### 2. DataSourceRegistry Type (`common/types.rs`)
36+
37+
```rust
38+
pub type DataSourceRegistry = HashMap<String, DataSource>;
39+
```
40+
41+
### 3. App Layer (`app.rs`)
42+
43+
- `app::run(query_str, data_sources: DataSourceRegistry, output_mode)` replaces the single `DataSource` param.
44+
- `app::explain(query_str, data_sources: DataSourceRegistry)` same.
45+
- `app::run_to_vec` (test helper) same.
46+
- `app::run_to_records` (bench helper) same.
47+
48+
### 4. ParsingContext (`common/types.rs`)
49+
50+
```rust
51+
pub(crate) struct ParsingContext {
52+
pub(crate) data_sources: DataSourceRegistry,
53+
pub(crate) registry: Arc<FunctionRegistry>,
54+
}
55+
```
56+
57+
Remove the `table_name` and single `data_source` fields.
58+
59+
### 5. Logical Planner (`logical/parser.rs`)
60+
61+
- `parse_query_top` and `parse_query` receive `DataSourceRegistry` instead of `DataSource`.
62+
- `build_from_node` looks up each table reference's base name in the registry to get its `DataSource`. Returns `ParseError::UnknownTable(name)` if not found.
63+
- `check_env` validates that every table name in the FROM clause exists in the `DataSourceRegistry`.
64+
- Subqueries and set operations pass the full registry down.
65+
66+
Add new error variant:
67+
```rust
68+
#[error("Unknown table '{0}'. Available tables: {1}")]
69+
UnknownTable(String, String),
70+
```
71+
72+
### 6. Physical Plan Creator (`logical/types.rs`)
73+
74+
`PhysicalPlanCreator` holds `DataSourceRegistry` instead of a single `DataSource`.
75+
76+
### 7. Backward Compatibility
77+
78+
Single-table usage is unchanged — a single `--table it:jsonl=data.jsonl` produces a one-entry registry `{"it" => DataSource::File(...)}`. All existing tests continue to work by constructing one-entry registries.
79+
80+
## Testing
81+
82+
### CLI Argument Parsing Tests
83+
84+
1. Single table: `--table it:jsonl=data.jsonl` produces one-entry registry
85+
2. Multiple tables: `--table a:jsonl=d1.jsonl --table b:jsonl=d2.jsonl` produces two-entry registry
86+
3. Invalid format: `--table a:badformat=d.jsonl` errors
87+
4. Duplicate table names: `--table a:jsonl=d1.jsonl --table a:jsonl=d2.jsonl` errors
88+
5. Missing table flag: errors with `InvalidTableSpecString`
89+
6. Unknown table in query: `--table a:jsonl=d.jsonl` with `SELECT * FROM b` errors with `UnknownTable`
90+
91+
### Integration Tests
92+
93+
7. Cross join two different files: `SELECT a.x, b.y FROM a, b`
94+
8. Left join two different files: `SELECT a.x, b.y FROM a LEFT JOIN b ON a.id = b.id`
95+
9. All existing ~58 tests pass (one-entry registry)
96+
97+
## Files Changed
98+
99+
| File | Change |
100+
|------|--------|
101+
| `src/cli.yml` | Add `multiple: true` to `table` arg |
102+
| `src/main.rs` | Multi-value table parsing, duplicate detection, build registry |
103+
| `src/common/types.rs` | Add `DataSourceRegistry` type, update `ParsingContext` |
104+
| `src/app.rs` | Change signatures to take `DataSourceRegistry` |
105+
| `src/logical/parser.rs` | Registry-based table lookup in `build_from_node`, new `UnknownTable` error |
106+
| `src/logical/types.rs` | Update `PhysicalPlanCreator` to hold registry |

0 commit comments

Comments
 (0)