(This code and summary authored by Claude Sonnet 4.6, with Corin's supervision)
Push-button multi-parameter lead optimization benchmark, following Chennakesavalu et al. Evaluates frontier LLMs on medicinal chemistry by asking them to iteratively improve a seed molecule against a real docking target while satisfying ADMET constraints.
Each turn the LLM proposes a new SMILES. The harness validates it, checks scaffold retention, runs docking and ADMET via the Rowan API, and feeds the scored result back into the next prompt. Outputs include per-proposal JSONL/CSV, a Markdown summary, and best-molecule SMILES files.
Requirements:
- Python 3.11 or newer
uv- API keys only for live LLM or Rowan runs; the smoke test below does not need keys
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create a virtual environment and install the package plus dev tools
uv sync --extra devFor live runs, export the credentials for the services you plan to use:
export ROWAN_API_KEY="..." # required for real oracle runs
export ANTHROPIC_API_KEY="..." # required for --llm claude
export OPENAI_API_KEY="..." # required for --llm openai# Smoke test — no API keys needed (echo LLM + mock oracles)
uv run python benchmark.py --target targets/example_target.yaml --llm echo --mock-oracles
# Equivalent installed command
uv run mpo-benchmark --target targets/example_target.yaml --llm echo --mock-oracles
# Real LLM, mock oracles — test prompting without spending Rowan credits
uv run python benchmark.py --target targets/example_target.yaml --llm claude --mock-oracles --turns 5
# Full run: Claude + real Rowan docking/ADMET
uv run python benchmark.py --target targets/example_target.yaml --llm claude
# GPT-4o
uv run python benchmark.py --target targets/example_target.yaml --llm openai --model gpt-4o
# 3 replicates per seed, up to 4 parallel trajectories
uv run python benchmark.py --target targets/example_target.yaml --llm claude \
--num-replicates 3 --max-parallel-trajectories 4benchmark.py --target YAML --llm {claude,claude-code,openai,echo} [options]
--model MODEL model override, e.g. claude-opus-4-7, gpt-4o
--mock-oracles use deterministic mock oracles (no Rowan credits)
--turns N override max_turns from the config
--num-replicates N independent replicates per seed (default: 1)
--max-parallel-trajectories N concurrent trajectories (default: 4)
--output-root DIR output root directory (default: runs/)
--llm claude-code uses the local claude CLI (no ANTHROPIC_API_KEY needed, requires Claude Code).
RuntimeError: RDKit is required but not installed.— run commands throughuv run ...afteruv sync --extra dev, or activate the generated.venv.KeyError: ROWAN_API_KEY— add--mock-oraclesfor local dry runs, or exportROWAN_API_KEYbefore live Rowan runs.- Missing LLM API key — use
--llm echofor the no-key smoke test,--llm claude-codefor an authenticated local Claude Code install, or export the provider key for--llm claude/--llm openai. - Rowan folder errors — update or remove
docking_config.folder_uuidin your target YAML if you do not have access to the example folder.
Each trajectory writes to runs/<target_id>/<seed_slug>/<timestamp>/:
| File | Contents |
|---|---|
trajectory.jsonl |
Full per-turn JSON record (properties, constraints, reward, Rowan job IDs) |
trajectory.csv |
Tabular summary, importable into pandas |
summary.md |
Human-readable Markdown report |
audit.jsonl |
Timestamped event log (prompts, responses, oracle calls) |
best_feasible_by_docking.smi |
Best molecule passing all constraints (lowest docking score) |
best_feasible_by_reward.smi |
Best molecule passing all constraints (highest reward) |
best_overall_by_reward.smi |
Highest reward molecule (may violate constraints) |
A benchmark_summary_<timestamp>.md is also written to runs/<target_id>/ aggregating all trajectories.
See targets/example_target.yaml for a worked example. Key fields:
target_id: my_target
protein_path: ../protein.pdb # relative to this YAML file
scaffold_policy:
gating_mode: medium # strict | medium | permissive
patterns:
strict: "SMARTS_STRING"
medium: "SMARTS_STRING" # the hard gate used at gating_mode: medium
permissive: "SMARTS_STRING"
linker_tags: # functional-group tags, annotated on every proposal
urea: "[NX3][CX3](=[OX1])[NX3]"
sulfonamide: "[NX3]S(=O)(=O)[#6]"
seeds:
- "SMILES_STRING"
max_turns: 20
constraints:
mw_max: 600.0
tpsa_max: 140.0
logp_max: 5.0
solubility_min: -4.0 # LogS (log mol/L); >= -4 ≈ 100 µM
permeability_min: -5.5 # LogPapp (log cm/s); Caco-2 threshold
max_rotatable_bonds: 10 # optional
docking_config:
center_x: 14.84
center_y: 6.83
center_z: 11.71
size_x: 17.8
size_y: 23.67
size_z: 22.5
folder_uuid: "28b7dea3-fd84-4438-b011-b53a87c16c47" # Rowan folder for run organisationThe pocket center/size coordinates come from your docking grid setup in Rowan. The folder_uuid keeps all oracle workflows organised under one Rowan folder.
Each valid, scaffold-passing proposal is scored relative to the seed:
reward = docking_gain # seed_docking − proposal_docking (higher = better)
− soft_penalty(MW, limit, scale=50)
− soft_penalty(TPSA, limit, scale=15)
− soft_penalty(cLogP, limit, scale=0.5)
− soft_penalty(solubility_deficit, scale=1)
− soft_penalty(permeability_deficit, scale=1)
− 0.02 × max(0, MW − seed_MW) # growth penalty
+ 2.0 if all hard constraints pass # feasibility bonus
New LLM backend — subclass LLMClient in mpo/llm_adapter.py:
class MyLLMClient(LLMClient):
def generate(self, system_prompt: str, user_prompt: str) -> str:
return smiles_stringThen add a branch in benchmark.py:build_llm().
New target — copy targets/example_target.yaml, set your protein_path, pocket coordinates, scaffold SMARTS, seed molecules, and constraints.
uv run pytest tests/ -vagentic-mpo/
├── mpo/
│ ├── __init__.py # public API
│ ├── target_schema.py # dataclasses: TargetConfig, Constraints, ProposalRecord, …
│ ├── chemistry.py # RDKit: validate, canonicalize, scaffold matching, properties
│ ├── prompt_builder.py # system prompt + per-turn user prompt
│ ├── rowan_oracles.py # RowanClient, MockRowanClient, RealRowanClient
│ ├── llm_adapter.py # LLMClient, EchoLLMClient, ClaudeClient, OpenAIClient
│ ├── reward.py # constraint evaluation + MPO reward
│ ├── trajectory.py # run_mpo_trajectory, state management, output writing
│ └── config.py # YAML/dict config loader
├── targets/
│ └── example_target.yaml # benchmark config (sulfonamide-urea MPO)
├── tests/ # pytest suite
├── benchmark.py # CLI runner
└── protein.pdb # example receptor structure
Corin Wagen, 2026