Overview
This issue tracks research and design conversation for the MacroPipe macro language, originally raised in PR panodata/tikray#51 by @amotl.
The current implementation uses a simple colon-separator-based positional syntax:
<function>:<arg1>:<arg2>:<arg3>
This works well for URL-based ETL pipelines (e.g. Apprise, CrateDB Toolkit I/O, Kotori, LorryStream, openDAQ, Polars, Proton), for example:
https://guided-path.s3.us-east-1.amazonaws.com/demo_climate_data_export.csv?pipe=json_array_to_wkt_point:coords&pipe=python_to_json:data
Goals
The desired future direction is to:
- Decouple the macro language completely from the MacroPipe engine, allowing multiple language frontends to compile to the same pipeline execution model.
- Maintain a super miniature macro language (colon-separator style) for URL-safe, text-only contexts.
- Optionally support a richer grammar variant with context-sensitive parsing, named arguments, and operator expressions.
Discussion Topics
1. Naming the Miniature Language
Candidate names for the colon-separator-based mini language:
- FlatPipe – reflects the doc note that the language is intentionally flat (no nesting, no named args)
- MicroPipe – nice contrast with MacroPipe (micro vs. macro)
- ColonQL – descriptive, references the colon delimiter
- PipeURL / UrlPipe – highlights the URL-parameter use case
- QuickPipe – emphasizes speed and terseness
2. Enriched Grammar with pyparsing
pyparsing (MIT, PEG-style, Python-native) is a strong candidate for implementing a richer grammar variant. Possible syntax ideas:
filter(total_amount > 40) | select(passenger_count, trip_distance, fare_amount)
rename(source=old_name, target=new_name) | scale(column=price, factor=1.1)
Alternatives worth evaluating:
- lark – faster, EBNF-based, good error messages
- textX – grammar + model in one, inspired by XText
- parsimonious – lightweight PEG, minimal dependencies
3. Improvements to the Colon-Based Format
Even keeping the minimal format, these improvements could help:
- Quoted string arguments to avoid escape proliferation:
func:"arg with : colon"
- Named/keyword arguments:
func:pos_arg:key=value
- Bracketed multi-value args:
select:[col1,col2,col3] instead of comma-in-string
- Type-annotated args: e.g.
cast:column:int (already done, worth formalizing)
4. Grammar Decoupling Architecture
A possible layered design:
[Miniature Language Parser] [Enriched Grammar Parser]
\ /
\ /
[Common IR / AST Layer]
|
[MacroPipe Engine (Polars)]
Each frontend parses its syntax and emits a shared intermediate representation (e.g., a list of (function_name, args_dict) tuples), which the engine then resolves and applies.
References
Overview
This issue tracks research and design conversation for the MacroPipe macro language, originally raised in PR panodata/tikray#51 by @amotl.
The current implementation uses a simple colon-separator-based positional syntax:
This works well for URL-based ETL pipelines (e.g. Apprise, CrateDB Toolkit I/O, Kotori, LorryStream, openDAQ, Polars, Proton), for example:
Goals
The desired future direction is to:
Discussion Topics
1. Naming the Miniature Language
Candidate names for the colon-separator-based mini language:
2. Enriched Grammar with pyparsing
pyparsing (MIT, PEG-style, Python-native) is a strong candidate for implementing a richer grammar variant. Possible syntax ideas:
Alternatives worth evaluating:
3. Improvements to the Colon-Based Format
Even keeping the minimal format, these improvements could help:
func:"arg with : colon"func:pos_arg:key=valueselect:[col1,col2,col3]instead of comma-in-stringcast:column:int(already done, worth formalizing)4. Grammar Decoupling Architecture
A possible layered design:
Each frontend parses its syntax and emits a shared intermediate representation (e.g., a list of
(function_name, args_dict)tuples), which the engine then resolves and applies.References