Skip to content

Use sampling chat template kwargs for renderer RL#2605

Draft
eligotts wants to merge 1 commit into
mainfrom
feat/chat-template-kwargs-renderers
Draft

Use sampling chat template kwargs for renderer RL#2605
eligotts wants to merge 1 commit into
mainfrom
feat/chat-template-kwargs-renderers

Conversation

@eligotts
Copy link
Copy Markdown
Contributor

@eligotts eligotts commented May 23, 2026

Summary

  • Use orchestrator.train.sampling.extra_body.chat_template_kwargs as the hosted RL config path for renderer template controls.
  • Pass that shared sampling value into the local renderer used for token reconstruction.
  • Reject per-env renderer chat-template kwargs overrides, since renderer-backed RL uses one local renderer for reconstruction.

We are intentionally extracting this info from sampling chat_template_kwargs instead of adding a renderer config field. Longer term, this should move toward typed pydantic config models with explicit renderer-specific template args, so users cannot dump arbitrary keys into config. Until then, renderers validate accepted kwargs and alert on unsupported keys.

Companion PRs

Validation

  • uvx ruff check packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py skills/configs/SKILL.md src/prime_rl/orchestrator/orchestrator.py src/prime_rl/trainer/sft/data.py src/prime_rl/utils/client.py tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/test_configs.py
  • PYTHONPATH=src:packages/prime-rl-configs/src:deps/pydantic-config/src:deps/verifiers:deps/renderers uv run --no-sync --with psutil --with setproctitle --with openai-harmony --with tiktoken --with fastokens --with prime --with "wandb>=0.26.1" python -m pytest tests/unit/test_configs.py -k "chat_template_kwargs"

Note: tests/unit/orchestrator/test_orchestrator_setup.py -k setup_student_inference_pool_uses_renderer_when_enabled cannot collect in this local Mac env because torchtitan is missing at import time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant