exp: verifiers v1 smoke configs by mikasenghaas · Pull Request #2637 · PrimeIntellect-ai/prime-rl

mikasenghaas · 2026-05-26T00:07:23Z

Summary

Bump deps/verifiers submodule to bbe929af (v0.1.15.dev10 — current main), which includes Expose V1 tools to the RLM harness. v1 envs can now expose toolsets through vf.RLM with no per-skill plumbing.
Bump deps/research-environments to b2133455b (head of PrimeIntellect-ai/research-environments#360 — wikispeedia ported to v1 with a CLI-configurable harness). Add wikispeedia to the envs extra and uv workspace.
Add three minimal Qwen3-4B-Instruct-2507 smoke configs (max_steps=5, dp=1, single-host) under per-env directories:
- configs/reverse_text/v1.toml — reverse-text env with args = { v1 = true } (in-process default harness, no tools).
- configs/wikispeedia/rl_qwen3_4b.toml — wikispeedia env, in-process harness with click_link / go_back tools.
- configs/wikispeedia/rl_qwen3_4b_rlm.toml — same wikispeedia env id, but with the harness swapped to RLM via:
```
[orchestrator.train.env.args.config.harness]
id = "verifiers.v1.packages.harnesses.rlm"
```

Single env id, single load_environment; harness selection is config-driven (matches the general-agent v1 pattern in PrimeIntellect-ai/research-environments#395).

Verification

uv run pytest tests/unit/test_configs.py -k 'wikispeedia or reverse_text/v1' — all 3 configs pass.
uv run rl @ configs/verifiers_v1/<each>.toml --dry-run — all 3 resolve cleanly; the RLM config's resolved orchestrator.toml correctly carries [train.env.args.config.harness] id = "verifiers.v1.packages.harnesses.rlm".
vf-eval smoke tests on gpt-5-mini (verified in PrimeIntellect-ai/research-environments#360): both harness paths reach reward 1.0 (16s in-process, 1m40s under RLM).

Notes

reverse-text-rlm is intentionally omitted: the upstream v1 reverse_text_v1.load_environment hardcodes vf.Harness (no harness.id dispatch yet), so swapping in vf.RLM would require an upstream patch. Happy to extend the pattern there too if useful.
This PR depends on feat(wikispeedia): port to verifiers v1 with CLI-configurable harness research-environments#360 landing (or being re-pinned). The submodule pointer is on the PR head until then.

Pin verifiers submodule to bbe929af (v0.1.15.dev10 release). Brings in 'Expose V1 tools to the RLM harness' (#1456) so v1 envs can expose toolsets to vf.RLM without per-skill plumbing. Co-authored-by: Cursor <cursoragent@cursor.com>

Bump `deps/research-environments` to b2133455b (head of PR #360 — wikispeedia v1 port). Register `wikispeedia` in the `envs` extra and uv workspace; uv.lock picks up the editable install. Add three minimal Qwen3-4B-Instruct-2507 smoke configs under `configs/verifiers_v1/` (max_steps=5, dp=1, single-host): - `rl_qwen3_4b_reverse_text.toml` — `reverse-text` env with `args = { v1 = true }` (in-process default harness). - `rl_qwen3_4b_wikispeedia.toml` — `wikispeedia` env, in-process harness with click_link/go_back tools. - `rl_qwen3_4b_wikispeedia_rlm.toml` — same `wikispeedia` env id, but with the harness swapped to RLM via `[orchestrator.train.env.args.config.harness] id = "verifiers.v1.packages.harnesses.rlm"`. Single env id for wikispeedia, single load_environment; harness selection is config-driven (matches the general-agent v1 pattern in research-environments#395). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

- configs/verifiers_v1/rl_qwen3_4b_reverse_text.toml -> configs/reverse_text/v1.toml - configs/verifiers_v1/rl_qwen3_4b_wikispeedia.toml -> configs/wikispeedia/rl_qwen3_4b.toml - configs/verifiers_v1/rl_qwen3_4b_wikispeedia_rlm.toml -> configs/wikispeedia/rl_qwen3_4b_rlm.toml Drops the configs/verifiers_v1/ dir; configs now live next to their env. Co-authored-by: Cursor <cursoragent@cursor.com>

mikasenghaas and others added 2 commits May 25, 2026 23:51

mikasenghaas force-pushed the exp/verifiers-v1 branch from 376c87c to 95b3203 Compare May 26, 2026 00:23

mikasenghaas changed the title ~~exp: verifiers v1 smoke configs (reverse-text + wikispeedia, default + RLM)~~ exp: verifiers v1 smoke configs (reverse-text + wikispeedia, default + RLM via harness.id) May 26, 2026

mikasenghaas changed the title ~~exp: verifiers v1 smoke configs (reverse-text + wikispeedia, default + RLM via harness.id)~~ exp: verifiers v1 smoke configs May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp: verifiers v1 smoke configs#2637

exp: verifiers v1 smoke configs#2637
mikasenghaas wants to merge 3 commits into
mainfrom
exp/verifiers-v1

mikasenghaas commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikasenghaas commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikasenghaas commented May 26, 2026 •

edited

Loading