POLCA: Stochastic Generative Optimization with LLM

This repository serves as the central entry point for reproducing the experimental results in the paper POLCA: Stochastic Generative Optimization with LLM.

We benchmark 3 algorithms across 4 benchmarks:

Algorithms

Algorithm	Description	Source Repo
POLCA	Stochastic generative optimization with LLM (ours)	`Trace`
GEPA	Genetic Evolution for Prompt Adaptation	`gepa-repo`
OpenEvolve	LLM-driven evolutionary optimization	`openevolve`

Benchmarks

Benchmark	Domain	Source Repo
τ-bench	Tool-agent-user interaction (agent optimization)	`tau-bench`
HotpotQA	Multi-hop question answering (prompt optimization)	`hotpotqa`
VeriBench	Formal verification with Lean (code generation)	`Trace-Bench/Veribench`
KernelBench	CUDA kernel optimization (code generation)	`Trace-Bench/KernelBench`

Experiment Matrix

	POLCA	GEPA	OpenEvolve
τ-bench	✅	✅	✅
HotpotQA	✅	✅	✅
VeriBench	✅	✅	✅
KernelBench	✅	✅	✅

Quick Start

1. Clone this repo and all dependencies

git clone https://github.com/rlx-lab/POLCA.git
cd POLCA

2. Clone all dependency repos

bash setup.sh

3. Set up and run a benchmark

Each benchmark has its own install.sh and run scripts. See the Per-Benchmark Setup section below.

# Example: HotpotQA
cd hotpotqa
bash install.sh
bash prompt_opt/run_trace.sh      # POLCA
bash prompt_opt/run_gepa.sh       # GEPA
bash prompt_opt/run_openevolve.sh # OpenEvolve

Repository Structure

POLCA/                          ← you are here
├── README.md                   ← this file
├── setup.sh                    ← clones all dependency repos

├── benchmarks/                 ← per-benchmark setup guides
│   ├── tau-bench.md
│   ├── hotpotqa.md
│   ├── veribench.md
│   └── kernelbench.md
│
├── Trace/                      ← POLCA algorithm (Trace framework)
├── gepa-repo/                  ← GEPA algorithm
├── openevolve/                 ← OpenEvolve algorithm
├── dspy-repo/                  ← DSPy framework (dependency)
│
├── hotpotqa/                   ← HotpotQA benchmark
│   └── prompt_opt/             ← optimization scripts for all 3 algorithms
├── tau-bench/                  ← τ-bench benchmark
│   └── my_processing_agents/   ← optimization scripts for all 3 algorithms
└── Trace-Bench/                ← Trace-Bench (contains VeriBench & KernelBench)
    ├── Veribench/
    └── KernelBench/

Per-Benchmark Setup

Each benchmark has detailed setup instructions covering installation, environment variables, and how to run all 3 algorithms:

τ-bench — Agent optimization on retail customer service tasks
HotpotQA — Prompt optimization for multi-hop QA
VeriBench — Lean formal verification code generation
KernelBench — CUDA kernel optimization

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
benchmarks		benchmarks
.gitignore		.gitignore
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POLCA: Stochastic Generative Optimization with LLM

Algorithms

Benchmarks

Experiment Matrix

Quick Start

1. Clone this repo and all dependencies

2. Clone all dependency repos

3. Set up and run a benchmark

Repository Structure

Per-Benchmark Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

POLCA: Stochastic Generative Optimization with LLM

Algorithms

Benchmarks

Experiment Matrix

Quick Start

1. Clone this repo and all dependencies

2. Clone all dependency repos

3. Set up and run a benchmark

Repository Structure

Per-Benchmark Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages