Skip to content

MacroPipe: Research & Design – Macro Language Evolution (miniature format + enriched grammar) #7

@coderabbitai

Description

@coderabbitai

Overview

This issue tracks research and design conversation for the MacroPipe macro language, originally raised in PR panodata/tikray#51 by @amotl.

The current implementation uses a simple colon-separator-based positional syntax:

<function>:<arg1>:<arg2>:<arg3>

This works well for URL-based ETL pipelines (e.g. Apprise, CrateDB Toolkit I/O, Kotori, LorryStream, openDAQ, Polars, Proton), for example:

https://guided-path.s3.us-east-1.amazonaws.com/demo_climate_data_export.csv?pipe=json_array_to_wkt_point:coords&pipe=python_to_json:data

Goals

The desired future direction is to:

  1. Decouple the macro language completely from the MacroPipe engine, allowing multiple language frontends to compile to the same pipeline execution model.
  2. Maintain a super miniature macro language (colon-separator style) for URL-safe, text-only contexts.
  3. Optionally support a richer grammar variant with context-sensitive parsing, named arguments, and operator expressions.

Discussion Topics

1. Naming the Miniature Language

Candidate names for the colon-separator-based mini language:

  • FlatPipe – reflects the doc note that the language is intentionally flat (no nesting, no named args)
  • MicroPipe – nice contrast with MacroPipe (micro vs. macro)
  • ColonQL – descriptive, references the colon delimiter
  • PipeURL / UrlPipe – highlights the URL-parameter use case
  • QuickPipe – emphasizes speed and terseness

2. Enriched Grammar with pyparsing

pyparsing (MIT, PEG-style, Python-native) is a strong candidate for implementing a richer grammar variant. Possible syntax ideas:

filter(total_amount > 40) | select(passenger_count, trip_distance, fare_amount)
rename(source=old_name, target=new_name) | scale(column=price, factor=1.1)

Alternatives worth evaluating:

  • lark – faster, EBNF-based, good error messages
  • textX – grammar + model in one, inspired by XText
  • parsimonious – lightweight PEG, minimal dependencies

3. Improvements to the Colon-Based Format

Even keeping the minimal format, these improvements could help:

  • Quoted string arguments to avoid escape proliferation: func:"arg with : colon"
  • Named/keyword arguments: func:pos_arg:key=value
  • Bracketed multi-value args: select:[col1,col2,col3] instead of comma-in-string
  • Type-annotated args: e.g. cast:column:int (already done, worth formalizing)

4. Grammar Decoupling Architecture

A possible layered design:

[Miniature Language Parser]   [Enriched Grammar Parser]
         \                            /
          \                          /
           [Common IR / AST Layer]
                    |
           [MacroPipe Engine (Polars)]

Each frontend parses its syntax and emits a shared intermediate representation (e.g., a list of (function_name, args_dict) tuples), which the engine then resolves and applies.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions