This repository serves as the central entry point for reproducing the experimental results in the paper POLCA: Stochastic Generative Optimization with LLM.
We benchmark 3 algorithms across 4 benchmarks:
| Algorithm | Description | Source Repo |
|---|---|---|
| POLCA | Stochastic generative optimization with LLM (ours) | Trace |
| GEPA | Genetic Evolution for Prompt Adaptation | gepa-repo |
| OpenEvolve | LLM-driven evolutionary optimization | openevolve |
| Benchmark | Domain | Source Repo |
|---|---|---|
| τ-bench | Tool-agent-user interaction (agent optimization) | tau-bench |
| HotpotQA | Multi-hop question answering (prompt optimization) | hotpotqa |
| VeriBench | Formal verification with Lean (code generation) | Trace-Bench/Veribench |
| KernelBench | CUDA kernel optimization (code generation) | Trace-Bench/KernelBench |
| POLCA | GEPA | OpenEvolve | |
|---|---|---|---|
| τ-bench | ✅ | ✅ | ✅ |
| HotpotQA | ✅ | ✅ | ✅ |
| VeriBench | ✅ | ✅ | ✅ |
| KernelBench | ✅ | ✅ | ✅ |
git clone https://github.com/rlx-lab/POLCA.git
cd POLCAbash setup.shEach benchmark has its own install.sh and run scripts. See the Per-Benchmark Setup section below.
# Example: HotpotQA
cd hotpotqa
bash install.sh
bash prompt_opt/run_trace.sh # POLCA
bash prompt_opt/run_gepa.sh # GEPA
bash prompt_opt/run_openevolve.sh # OpenEvolvePOLCA/ ← you are here
├── README.md ← this file
├── setup.sh ← clones all dependency repos
├── benchmarks/ ← per-benchmark setup guides
│ ├── tau-bench.md
│ ├── hotpotqa.md
│ ├── veribench.md
│ └── kernelbench.md
│
├── Trace/ ← POLCA algorithm (Trace framework)
├── gepa-repo/ ← GEPA algorithm
├── openevolve/ ← OpenEvolve algorithm
├── dspy-repo/ ← DSPy framework (dependency)
│
├── hotpotqa/ ← HotpotQA benchmark
│ └── prompt_opt/ ← optimization scripts for all 3 algorithms
├── tau-bench/ ← τ-bench benchmark
│ └── my_processing_agents/ ← optimization scripts for all 3 algorithms
└── Trace-Bench/ ← Trace-Bench (contains VeriBench & KernelBench)
├── Veribench/
└── KernelBench/
Each benchmark has detailed setup instructions covering installation, environment variables, and how to run all 3 algorithms:
- τ-bench — Agent optimization on retail customer service tasks
- HotpotQA — Prompt optimization for multi-hop QA
- VeriBench — Lean formal verification code generation
- KernelBench — CUDA kernel optimization