Workflow Pattern Comparison

A benchmark harness that runs the same task through different agentic patterns and measures what actually matters: latency, cost, and quality.

Compare workflow patterns empirically to develop judgment about when to use each one.

Quick Start

uv run python examples/run_comparison.py

Results Summary

Task	Winner	Why
Simple Q&A	single_call	2.4x faster, 2.8x cheaper, same quality
Content Gen	evaluator_optimize	2x faster, hit word target (chain rambled)
Research	orchestrator	2x faster, more focused output

Decision Framework

Use Single Call When:

Task is straightforward (Q&A, simple generation)
Quality doesn't improve with decomposition
Latency matters

Use Chaining When:

Steps genuinely build on each other
Intermediate output adds value
NOT just to "think more carefully" (model already does this)

Use Evaluator-Optimizer When:

Output has clear criteria to evaluate against
Refinement actually improves quality
One focused revision beats sprawling first drafts

Use Orchestrator (not Parallel) When:

Subtasks need coordination
Results must be synthesized coherently
Parallel isn't always faster - coordination overhead vs. unfocused bulk
Note: my "parallel" ran sequentially for simplicity - true parallelism would be faster but might still produce unfocused bulk

Key Insight

More patterns != better results.

The orchestrator beat parallel because coordination produces focused output. The evaluator beat chaining because targeted refinement beats hoping more steps help.

Measure before assuming complexity adds value.

Project Structure

src/
├── metrics.py      # TrackedClient for token/latency tracking
├── benchmark.py    # Benchmark harness with LLM-as-judge (5x Haiku voting)
└── tasks/
    ├── simple_qa.py    # single_call vs chain_call
    ├── content_gen.py  # chain_generate vs evaluator_optimize
    └── research.py     # parallel_research vs orchestrator_research

Benchmark Details

Quality: LLM-as-judge with 5 Haiku votes averaged (reduces noise)
Cost: Claude Sonnet 4.5 pricing ($3/M input, $15/M output)
Outputs: Saved to results/outputs/ for manual inspection

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Workflow Pattern Comparison

Quick Start

Results Summary

Decision Framework

Use Single Call When:

Use Chaining When:

Use Evaluator-Optimizer When:

Use Orchestrator (not Parallel) When:

Key Insight

Project Structure

Benchmark Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Workflow Pattern Comparison

Quick Start

Results Summary

Decision Framework

Use Single Call When:

Use Chaining When:

Use Evaluator-Optimizer When:

Use Orchestrator (not Parallel) When:

Key Insight

Project Structure

Benchmark Details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages