Datagen is a Deno CLI for building dataset workflows with LLMs.
Instead of one giant prompt, you define a YAML pipeline of stages and run it as a reproducible dataflow:
- generate structured records
- transform existing datasets
- branch conditionally
- delegate complex branches to child workflows
- validate and retry outputs
Datagen writes both:
- dataset output (
.jsonl) - full run report (
.report.json) with traces, warnings, stage statuses, and dependency graph
- Strict YAML workflow schema validation
- Stage DAG execution with
dependsOnand conditionalwhen - Stage modes:
batchiterrecord_transform(conversation_rewrite)workflow_delegate(run child workflow from a stage)
- Parallel per-item/per-record execution for
iterandrecord_transform - Input dataset loading from
json/jsonl - Input remapping (
prefixed_string_array,alpaca) - Typed structured output with
constrain - Semantic/content validators with retry feedback
- Streaming + resume/checkpoint support for long JSONL rewrite runs (streaming-compatible shape)
- Provider support:
openai,ollama
- Deno 2.x
- Access to an LLM backend (local Ollama or hosted OpenAI-compatible endpoint)
deno task start -- example.pipeline.yamlFrom repo root:
deno install -g -n datagen -A main.tsThen use it anywhere:
datagen your-workflow.pipeline.yamldeno install -g -n datagen -A https://raw.githubusercontent.com/SvalTek/datagen/main/main.tsThen run:
datagen your-workflow.pipeline.yamlRun a sample workflow:
deno task start -- example.pipeline.yamlRun with warning-focused console output:
deno task start -- sharegpt-rewrite.pipeline.yaml --console warningsRun with full report in terminal:
deno task start -- llm-arena-rewrite.pipeline.yaml --console full- Workflow/data paths are resolved from your current working directory when you run the command.
- In delegated stages (
workflow_delegate), childdelegate.workflowPathis resolved relative to the parent workflow file location.
Defined in deno.jsonc:
deno task start -- <workflow.pipeline.yaml> [flags]deno task test(runs full test suite undertests/)deno task dev(watch mode formain.ts)
- Workflow Patterns
Tutorial-style crash course for building workflows - Branching Workflows
DAG/branching patterns,dependsOn,when, and delegated branch usage - Workflow Reference
Full workflow schema and field-level behavior - CLI Reference
Flags, precedence rules, progress/report behavior