Datagen runs through:
deno run -A main.ts <workflow.yml> [options]<workflow.yml>: path to the workflow file
deno run -A main.ts sharegpt-rewrite.pipeline.yamlThis will:
- load the workflow
- run the configured stages
- write the final dataset JSONL
- write a full report JSON file
--helpprints usage and exits.
--model <name>--endpoint <url>--provider <name>
These override workflow defaults.
--api-key <token>--http-referer <url>--x-title <name>
Use these for hosted OpenAI-compatible providers such as OpenRouter.
--max-tokens <number>--temperature <number>--parallelism <number>
These override workflow-level maxTokens and temperature.
--parallelism sets a global worker cap for iter and record_transform.
--parallelism must be an integer >= 1.
--out <path>--output-dir <path>--resume <path>--checkpoint-every <n>
Behavior:
- final dataset output still goes to the workflow-selected output location under the resolved output directory
--outcontrols the report JSON path- if
--outis omitted, Datagen writes<resolved-output-dir>/<workflow-name>.report.json --checkpoint-everymust be an integer>= 1--checkpoint-everywrites checkpoint metadata during streaming runs- checkpoint path defaults to
<output-dir>/<workflow-name>.checkpoint.json - if
--resumeis provided, that path is loaded as checkpoint input --resumealso reuses the same path for checkpoint writes when checkpoint writing is enabled
Streaming note:
- resume/checkpoint behavior is currently designed for streaming-compatible
record_transformruns - streaming-compatible means: single-stage
record_transform+conversation_rewrite+ input sourcepipeline_input - if that shape is not met, Datagen runs the normal eager path
- delegated child workflows (
workflow_delegate) always run eager in v1
--context <json>--context-file <path>
These inject initial JSON context.
You can pass only one of --context or --context-file.
They are useful for generation workflows that need starting metadata or input context outside a dataset file.
--console summary--console warnings--console quiet--console full
Default mode.
Shows:
- short run summary
- live progress
- final summary
Does not print the full JSON report to the terminal.
Like summary, plus compact warning lines.
This is useful for long runs where you want to see validator/runtime warnings without the full report dump.
Suppresses normal terminal output.
Useful for scripting.
Prints the full JSON report to stdout or stderr.
This is the closest behavior to the old verbose mode.
--progress--no-progress
Default behavior:
- enabled in
summaryandwarnings - disabled in
quietandfull
Progress behavior:
itershows per-item progressrecord_transformshows per-record progressbatchdoes not show bar-style iterative progress
--show-thoughts
By default, model reasoning/thought output is suppressed in terminal output.
Use this flag if you explicitly want it printed during the run.
This flag only affects terminal display. It does not request reasoning mode
from the model. Request-side reasoning transport is controlled by workflow
reasoning plus top-level reasoningMode.
Datagen requires a model from one of:
--model- workflow
model DATAGEN_MODELOLLAMA_MODEL
If none are set, the run fails with a usage/config error and writes a failure report.
Successful runs normally produce:
- final dataset:
<resolved-output-dir>/<workflow-name>.jsonl - run report:
<resolved-output-dir>/<workflow-name>.report.json
The report contains:
- run metadata
- per-stage traces
- warnings
- per-stage outputs
Datagen resolves API keys in this order:
--api-key- workflow
apiKeyEnv DATAGEN_OPENAI_API_KEYOPENAI_API_KEYOPENROUTER_API_KEY
Datagen resolves provider in this order:
--provider- workflow
provider DATAGEN_PROVIDER- default
openai
For HTTP-Referer and X-Title:
- CLI flag
- workflow value
DATAGEN_HTTP_REFERER/DATAGEN_X_TITLE
--endpoint- workflow
endpoint DATAGEN_OPENAI_ENDPOINT
deno run -A main.ts example.pipeline.yamldeno run -A main.ts sharegpt-rewrite.pipeline.yaml --console warningsdeno run -A main.ts sharegpt-rewrite.pipeline.yaml --console fulldeno run -A main.ts llm-arena-rewrite.pipeline.yaml --no-progressdeno run -A main.ts sharegpt-rewrite.pipeline.yaml --temperature 3.0$env:OPENROUTER_API_KEY="your-key"
deno run -A main.ts example.pipeline.yaml --endpoint https://openrouter.ai/api/ --model openai/gpt-4o-mini- CLI values override workflow defaults where both exist.
- delegated stages can opt into parent override inheritance via
delegate.inheritParentCli. - with
delegate.inheritParentCli: none(default), child workflow runtime config is respected. - The report file is now the canonical place for full detail; normal terminal output is intentionally concise.
- For rewrite-heavy local workflows,
--console warningsis usually the most useful interactive mode.