exp: verifiers v1 smoke configs#2637
Draft
mikasenghaas wants to merge 3 commits into
Draft
Conversation
Pin verifiers submodule to bbe929af (v0.1.15.dev10 release). Brings in 'Expose V1 tools to the RLM harness' (#1456) so v1 envs can expose toolsets to vf.RLM without per-skill plumbing. Co-authored-by: Cursor <cursoragent@cursor.com>
Bump `deps/research-environments` to b2133455b (head of PR #360 — wikispeedia v1 port). Register `wikispeedia` in the `envs` extra and uv workspace; uv.lock picks up the editable install. Add three minimal Qwen3-4B-Instruct-2507 smoke configs under `configs/verifiers_v1/` (max_steps=5, dp=1, single-host): - `rl_qwen3_4b_reverse_text.toml` — `reverse-text` env with `args = { v1 = true }` (in-process default harness). - `rl_qwen3_4b_wikispeedia.toml` — `wikispeedia` env, in-process harness with click_link/go_back tools. - `rl_qwen3_4b_wikispeedia_rlm.toml` — same `wikispeedia` env id, but with the harness swapped to RLM via `[orchestrator.train.env.args.config.harness] id = "verifiers.v1.packages.harnesses.rlm"`. Single env id for wikispeedia, single load_environment; harness selection is config-driven (matches the general-agent v1 pattern in research-environments#395). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
376c87c to
95b3203
Compare
- configs/verifiers_v1/rl_qwen3_4b_reverse_text.toml -> configs/reverse_text/v1.toml - configs/verifiers_v1/rl_qwen3_4b_wikispeedia.toml -> configs/wikispeedia/rl_qwen3_4b.toml - configs/verifiers_v1/rl_qwen3_4b_wikispeedia_rlm.toml -> configs/wikispeedia/rl_qwen3_4b_rlm.toml Drops the configs/verifiers_v1/ dir; configs now live next to their env. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
deps/verifierssubmodule tobbe929af(v0.1.15.dev10 — currentmain), which includesExpose V1 tools to the RLM harness. v1 envs can now expose toolsets throughvf.RLMwith no per-skill plumbing.deps/research-environmentstob2133455b(head of PrimeIntellect-ai/research-environments#360 — wikispeedia ported to v1 with a CLI-configurable harness). Addwikispeediato theenvsextra and uv workspace.configs/reverse_text/v1.toml—reverse-textenv withargs = { v1 = true }(in-process default harness, no tools).configs/wikispeedia/rl_qwen3_4b.toml—wikispeediaenv, in-process harness withclick_link/go_backtools.configs/wikispeedia/rl_qwen3_4b_rlm.toml— samewikispeediaenv id, but with the harness swapped to RLM via:Single env id, single
load_environment; harness selection is config-driven (matches the general-agent v1 pattern in PrimeIntellect-ai/research-environments#395).Verification
uv run pytest tests/unit/test_configs.py -k 'wikispeedia or reverse_text/v1'— all 3 configs pass.uv run rl @ configs/verifiers_v1/<each>.toml --dry-run— all 3 resolve cleanly; the RLM config's resolvedorchestrator.tomlcorrectly carries[train.env.args.config.harness] id = "verifiers.v1.packages.harnesses.rlm".gpt-5-mini(verified in PrimeIntellect-ai/research-environments#360): both harness paths reach reward 1.0 (16s in-process, 1m40s under RLM).Notes
reverse-text-rlmis intentionally omitted: the upstream v1reverse_text_v1.load_environmenthardcodesvf.Harness(noharness.iddispatch yet), so swapping invf.RLMwould require an upstream patch. Happy to extend the pattern there too if useful.