Skip to content

exp: verifiers v1 smoke configs#2637

Draft
mikasenghaas wants to merge 3 commits into
mainfrom
exp/verifiers-v1
Draft

exp: verifiers v1 smoke configs#2637
mikasenghaas wants to merge 3 commits into
mainfrom
exp/verifiers-v1

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas commented May 26, 2026

Summary

  • Bump deps/verifiers submodule to bbe929af (v0.1.15.dev10 — current main), which includes Expose V1 tools to the RLM harness. v1 envs can now expose toolsets through vf.RLM with no per-skill plumbing.
  • Bump deps/research-environments to b2133455b (head of PrimeIntellect-ai/research-environments#360 — wikispeedia ported to v1 with a CLI-configurable harness). Add wikispeedia to the envs extra and uv workspace.
  • Add three minimal Qwen3-4B-Instruct-2507 smoke configs (max_steps=5, dp=1, single-host) under per-env directories:
    • configs/reverse_text/v1.tomlreverse-text env with args = { v1 = true } (in-process default harness, no tools).
    • configs/wikispeedia/rl_qwen3_4b.tomlwikispeedia env, in-process harness with click_link / go_back tools.
    • configs/wikispeedia/rl_qwen3_4b_rlm.toml — same wikispeedia env id, but with the harness swapped to RLM via:
      [orchestrator.train.env.args.config.harness]
      id = "verifiers.v1.packages.harnesses.rlm"

Single env id, single load_environment; harness selection is config-driven (matches the general-agent v1 pattern in PrimeIntellect-ai/research-environments#395).

Verification

  • uv run pytest tests/unit/test_configs.py -k 'wikispeedia or reverse_text/v1' — all 3 configs pass.
  • uv run rl @ configs/verifiers_v1/<each>.toml --dry-run — all 3 resolve cleanly; the RLM config's resolved orchestrator.toml correctly carries [train.env.args.config.harness] id = "verifiers.v1.packages.harnesses.rlm".
  • vf-eval smoke tests on gpt-5-mini (verified in PrimeIntellect-ai/research-environments#360): both harness paths reach reward 1.0 (16s in-process, 1m40s under RLM).

Notes

mikasenghaas and others added 2 commits May 25, 2026 23:51
Pin verifiers submodule to bbe929af (v0.1.15.dev10 release).

Brings in 'Expose V1 tools to the RLM harness' (#1456) so v1 envs can
expose toolsets to vf.RLM without per-skill plumbing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bump `deps/research-environments` to b2133455b (head of PR #360 —
wikispeedia v1 port). Register `wikispeedia` in the `envs` extra and
uv workspace; uv.lock picks up the editable install.

Add three minimal Qwen3-4B-Instruct-2507 smoke configs under
`configs/verifiers_v1/` (max_steps=5, dp=1, single-host):

- `rl_qwen3_4b_reverse_text.toml` — `reverse-text` env with
  `args = { v1 = true }` (in-process default harness).
- `rl_qwen3_4b_wikispeedia.toml` — `wikispeedia` env, in-process
  harness with click_link/go_back tools.
- `rl_qwen3_4b_wikispeedia_rlm.toml` — same `wikispeedia` env id,
  but with the harness swapped to RLM via
  `[orchestrator.train.env.args.config.harness] id = "verifiers.v1.packages.harnesses.rlm"`.

Single env id for wikispeedia, single load_environment; harness
selection is config-driven (matches the general-agent v1 pattern in
research-environments#395).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@mikasenghaas mikasenghaas changed the title exp: verifiers v1 smoke configs (reverse-text + wikispeedia, default + RLM) exp: verifiers v1 smoke configs (reverse-text + wikispeedia, default + RLM via harness.id) May 26, 2026
- configs/verifiers_v1/rl_qwen3_4b_reverse_text.toml -> configs/reverse_text/v1.toml
- configs/verifiers_v1/rl_qwen3_4b_wikispeedia.toml -> configs/wikispeedia/rl_qwen3_4b.toml
- configs/verifiers_v1/rl_qwen3_4b_wikispeedia_rlm.toml -> configs/wikispeedia/rl_qwen3_4b_rlm.toml

Drops the configs/verifiers_v1/ dir; configs now live next to their env.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mikasenghaas mikasenghaas changed the title exp: verifiers v1 smoke configs (reverse-text + wikispeedia, default + RLM via harness.id) exp: verifiers v1 smoke configs May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant