Skip to content

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9

Open
hallerite wants to merge 2 commits into
mainfrom
feat/llama-3-renderer
Open

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9
hallerite wants to merge 2 commits into
mainfrom
feat/llama-3-renderer

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 7, 2026

Summary

Hand-coded Llama3Renderer for Meta's Llama-3.x chat template, plus matching parse_llama_3 parser. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (auto-routed via MODEL_RENDERER_MAP). No version bump.

How tests work without a Meta-license HF token

MODEL_RENDERER_MAP registers the canonical meta-llama/... paths so production callers auto-route. Tests load the tokenizer via the unrestricted unsloth/Llama-3.2-{1B,3B}-Instruct mirror — the chat-template SHA matches Meta's bit-for-bit and the underlying tiktoken-BPE files are identical. CI doesn't need an HF_TOKEN with Meta license access.

Implementation notes

  • No <think> / reasoning channel — Llama-3 doesn't ship one. preserve_*_thinking constructor flags raise NotImplementedError if set (matches DefaultRenderer's contract for the same case).
  • <|begin_of_text|> (BOS) is emitted at the start of every render; system block is always emitted with the fixed Cutting Knowledge Date / Today Date preamble even when no system message is supplied.
  • date_string is a constructor kwarg, defaulting to "26 Jul 2024" (the chat template's strftime fallback) so output stays deterministic. Override per-instance for production runs that want today's date.
  • tools_in_user_message defaults to True (matches chat template). Tools + JSON signatures inject into the first user message; pass False to flip to system-block mode. Both modes parity-tested.
  • Single tool call per assistant message (chat template raises otherwise). Tool calls render as a JSON blob {"name": "...", "parameters": ...} inside the assistant body. Tool responses render under role ipython regardless of source role; mirrors the chat template's content | tojson branch — including the Jinja quirk that strings are iterable, so plain-string tool content gets JSON-quoted.
  • parse_llama_3 detects the JSON tool-call body shape with a strict starts-with-{ + parses-as-dict-with-name check; malformed JSON falls through to content rather than dropping silently.

Tests

47 dedicated tests in tests/test_llama_3.py:

  • MODEL_RENDERER_MAP shape + factory routing
  • Constructor contract (default date, preserve_*_thinking rejection, tools_in_user_message toggle)
  • Byte parity vs apply_chat_template across 11 conversation shapes (system + user, user-only, multi-turn, gen prompt, whitespace trimming, custom date, tools-in-user, tools-in-system, tool call round-trip, dict tool response, multiple-tool-calls rejection)
  • parse_response (plain, tool call, malformed JSON fallthrough)
  • Bridge contract (extends prev verbatim, matches fresh render, rejects assistant in extension, synthesises close on truncation)

Test plan

  • pytest tests/test_llama_3.py — 47 cases pass on both 1B and 3B mirrors
  • Full suite (pytest tests/ --ignore=tests/test_client.py) — 947 pass, 48 skipped, 1 xfailed (no regressions)
  • Pre-commit hooks (ruff check + format) clean
  • Maintainer with Meta-license HF_TOKEN can verify meta-llama/Llama-3.2-1B-Instruct parity directly (the unsloth mirror has been bit-verified, but a once-off canonical run is good defense in depth)

🤖 Generated with Claude Code


Note

Medium Risk
Adds new model-specific rendering/parsing and auto-routing for meta-llama/Llama-3.2-* which can change prompt/token generation and tool-call handling for those models. Risk is mitigated by extensive parity tests but any template mismatch would affect downstream training/inference correctness.

Overview
Adds a new hand-coded Llama3Renderer implementing Meta Llama-3.2 Instruct chat-template rendering (including deterministic date_string, tool injection mode, tool-call/response formatting, stop tokens, and a bridge_to_next_turn fast-path).

Wires the renderer into exports and auto-detection by extending MODEL_RENDERER_MAP/RENDERER_REGISTRY with a new llama-3 entry for meta-llama/Llama-3.2-1B/3B-Instruct, and adds parse_llama_3 to interpret Llama-3 JSON tool-call completions.

Introduces a dedicated tests/test_llama_3.py suite that byte-compares renderer output against tokenizer.apply_chat_template (using unsloth/... mirrors) and covers tools, parsing, and bridge behavior.

Reviewed by Cursor Bugbot for commit c5d2aa5. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add Llama3Renderer for Llama-3.2-1B/3B-Instruct chat template rendering

  • Adds renderers/llama_3.py with Llama3Renderer, a deterministic renderer for Llama-3.x Instruct models implementing token/ID rendering, tool-call emission, ipython tool responses, response parsing, stop tokens, and turn bridging.
  • Maps meta-llama/Llama-3.2-1B-Instruct and meta-llama/Llama-3.2-3B-Instruct in MODEL_RENDERER_MAP so create_renderer(..., renderer='auto') routes to the new renderer; renderer='llama-3' also resolves via the registry.
  • Tool calls are emitted as a single JSON object {"name": ..., "parameters": ...}; multiple tool calls per assistant message raise an error. parse_llama_3 in renderers/parsing.py parses this back into ParsedResponse.tool_calls, falling back to raw content for non-JSON output.
  • preserve_*_thinking flags raise NotImplementedError; the default date_string is hardcoded to '26 Jul 2024'.

Macroscope summarized c5d2aa5.

hallerite and others added 2 commits May 7, 2026 17:38
Hand-coded Llama3Renderer mirroring Meta's Llama-3.x chat template.
Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (and the
unrestricted unsloth/... mirrors with byte-identical chat templates).
MODEL_RENDERER_MAP routes the canonical meta-llama paths; tests load
via the unsloth mirrors so CI doesn't need an HF_TOKEN with Meta
license access.

Implementation notes:

* No <think> / reasoning channel — preserve_*_thinking constructor
  flags raise NotImplementedError if set (matches DefaultRenderer's
  contract for the same case).

* <|begin_of_text|> (BOS) is emitted at the start of every render. The
  system block is emitted UNCONDITIONALLY with a fixed
  "Cutting Knowledge Date / Today Date" preamble even when no system
  message is supplied. date_string is a constructor kwarg pinned at
  "26 Jul 2024" by default (matches the chat template's strftime
  fallback); override per instance for production runs that want
  today's date.

* tools_in_user_message defaults to True. Tools + JSON signatures
  inject into the first user message; pass False at construction to
  flip to system-block mode. Both modes parity-tested.

* Single tool call per assistant message (chat template raises
  otherwise). Tool calls render as a JSON blob inside the assistant
  body. Tool responses render under role ipython regardless of source
  role; mirrors the chat template's content|tojson branch including
  the Jinja quirk that strings are iterable so plain-string tool
  content gets JSON-quoted.

* parse_llama_3 detects the JSON tool-call body shape with a strict
  check; malformed JSON falls through to content.

47 dedicated tests covering map shape, constructor contract, byte
parity across 11 conversation shapes (including tool calls, multi-turn,
custom date, tools-in-system mode), parse_response, and bridge
contract. Full suite: 947 passed, 48 skipped, 1 xfailed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflicts in renderers/__init__.py and renderers/base.py:
- Add LagunaXS2Renderer (origin/main) alongside Llama3Renderer (PR).
- Rename Llama-3 registry key from "llama_3" to "llama-3" to match
  origin/main's hyphenated convention (also applied to deepseek-v3,
  kimi-k2, kimi-k2.5, nemotron-3, gpt-oss). Update the matching
  MODEL_RENDERER_MAP entries and tests/test_llama_3.py assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hallerite hallerite marked this pull request as draft May 20, 2026 13:33
@hallerite hallerite marked this pull request as ready for review May 20, 2026 13:33
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c5d2aa5. Configure here.

Comment thread renderers/llama_3.py
emit_special(self._end_header, -1)
emit_text("\n\n", -1)

return previous_ids + ext
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bridge returns list not RenderedTokens

High Severity

Llama3Renderer.bridge_to_next_turn returns a bare list[int], while the Renderer protocol and every other hand-coded renderer return RenderedTokens | None with tokens in .token_ids. Callers such as RendererPool and tests/test_bridge.py use bridged.token_ids, which raises AttributeError on a list.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c5d2aa5. Configure here.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 20, 2026

Approvability

Verdict: Needs human review

This PR introduces a new Llama-3 renderer (~400 lines of new code), which constitutes a new feature/capability requiring human review. Additionally, an unresolved high-severity review comment identifies a protocol violation in bridge_to_next_turn that would cause runtime errors.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant