UPSTREAM PR #21242: fix: tool call parsing for LFM2 and LFM2.5 models by loci-dev · Pull Request #1325 · auroralabs-loci/llama.cpp

loci-dev · 2026-04-01T03:10:42Z

Note

Source pull request: ggml-org/llama.cpp#21242

Overview

Currently, LFM2 & LFM2.5 tool calling is broken in llama.cpp, issue ggml-org/llama.cpp#20245, this commit ggml-org/llama.cpp#20251 introduced a dedicated parser for LFM2; however, LFM2.5 and LFM2 have different tool calling jinja templates

This PR fixes the tool calling parser to catch the expected formats for both cases

LFM2: tool list as List of tools: <|tool_list_start|>[...]<|tool_list_end|>, tool calls as
<|tool_call_start|>[name(arg="val")]<|tool_call_end|>
LFM2.5: tool list as List of tools: [...], tool calls as bare [name(arg="val")] with no wrapper tokens

Added common_chat_params_init_lfm2_5

Testing

Added unit test for lfm2.5 in test-chat.cpp
Tested tool calling use case locally with both LFM2.5-1.2B-Instruct-BF16.gguf and LFM2-8B-A1B-Q4_0.gguf

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:
Used claude code for assistance in tracing code, assistance in where to make fix, generating local test scripts

loci-review · 2026-04-01T04:42:58Z

Overview

Analysis of 124,195 functions across 15 binaries reveals negligible performance impact from LFM2/LFM2.5 parsing refactoring. 115 functions modified (0.09%), 192 new, 0 removed. All changes are compiler-generated STL code artifacts—no modifications to performance-critical inference paths.

Power Consumption Changes:

build.bin.llama-cvector-generator: +0.26% (+949 nJ)
build.bin.llama-tts: +0.14% (+520 nJ)
build.bin.libllama.so: 0.00%
build.bin.libmtmd.so: 0.00%
build.bin.llama-bench: 0.00%
build.bin.libggml.so: 0.00%
build.bin.libggml-cpu.so: 0.00%
build.bin.libggml-base.so: 0.00%
build.bin.llama-tokenize: 0.00%
build.bin.llama-gemma3-cli: 0.00%
build.bin.llama-gguf-split: 0.00%
build.bin.llama-llava-cli: 0.00%
build.bin.llama-minicpmv-cli: 0.00%
build.bin.llama-quantize: 0.00%
build.bin.llama-qwen2vl-cli: 0.00%

Function Analysis

All modified functions are STL template instantiations (std::vector, std::map, std::_Rb_tree) with no source code changes. Performance variations result from compiler code generation differences between builds.

Most Significant Changes:

std::_Rb_tree::_S_key() (llama-tts): Response time +165% (+186ns), throughput time +311% (+186ns). Red-black tree key extraction for JSON maps. Used during initialization only.
std::_Rb_tree::begin() (llama-cvector-generator): Response time +220% (+182ns), throughput time +289% (+182ns). Map iterator initialization with extra unconditional branch in target version.
std::vector::end() (llama-tts): Response time -69% (-183ns), throughput time -75% (-183ns). Compiler eliminated indirect jump, improved code layout.
std::vector::back() (llama-tts): Response time -42% (-190ns), throughput time -73% (-190ns). Entry block optimization removed unnecessary jumps.
jinja::parser::parse_any() (llama-tts): Response time +0.9% (+133ns), throughput time +68% (+137ns). Template parser dispatcher with extra entry block.

Other analyzed functions show similar compiler-induced variations in non-critical initialization code. No changes detected in inference hot paths: llama_decode(), matrix operations, attention mechanisms, KV cache, sampling, or GPU backends remain unaffected.

Additional Findings

Source code changes limited to common/chat.cpp for LFM2/LFM2.5 tool call parsing refactoring—completely isolated from inference pipeline. GPU/ML operations unaffected: all CUDA, Metal, Vulkan, and other backend implementations unchanged. Critical inference components (GEMM operations, Flash Attention, quantization kernels) show zero modifications. The refactoring successfully improves code organization without impacting inference performance.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

jbuchananr added 2 commits March 31, 2026 15:03

fix: tool call parsing for LFM2 and LFM2.5 models'

2555032

refactor: add test / break out lfm2 and lfm2.5 parsing logic

89f0ccc

loci-dev temporarily deployed to PROD__AL_DEMO April 1, 2026 03:10 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 9 times, most recently from 126cd1f to a8215be Compare April 8, 2026 02:18

loci-dev force-pushed the main branch 7 times, most recently from e800934 to a024d9c Compare April 15, 2026 02:19

loci-dev force-pushed the main branch 6 times, most recently from 7638ab4 to f1b46d5 Compare April 20, 2026 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #21242: fix: tool call parsing for LFM2 and LFM2.5 models#1325

UPSTREAM PR #21242: fix: tool call parsing for LFM2 and LFM2.5 models#1325
loci-dev wants to merge 2 commits intomainfrom
loci/pr-21242-fix-lfm2-lfm2-5-tool-calling

loci-dev commented Apr 1, 2026

Uh oh!

loci-review Bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Apr 1, 2026

Overview

Testing

Requirements

Uh oh!

loci-review Bot commented Apr 1, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants