feat: add streaming tool use by lsorber · Pull Request #1884 · abetlen/llama-cpp-python

lsorber · 2024-12-25T23:17:58Z

This PR upgrades the chatml-function-calling chat handler with support for streaming tool use and fixes #1883, #1869, and #1756, among other improvements.

Changes:

General:
a. ✨ If no system message is supplied, add an empty system message to hold the tool metadata.
b. ✨ Add function descriptions to the system message so that tool use is better informed (fixes chatml-function-callling not adding tool description to the prompt. #1869).
c. ✨ Replace print statements relating to JSON grammars with RuntimeWarning warnings.
d. ✅ Add tests with fairly broad coverage of the different scenarios.
Case "Tool choice by user":
a. ✨ Add support for more than one function call by making this a special case of "Automatic tool choice" with a single tool (subsumes Support parallel function calls with tool_choice #1503).
Case "Automatic tool choice -> respond with a message":
a. ✨ Use user-defined stop and max_tokens.
b. 🐛 Replace incorrect use of follow-up grammar with user-defined grammar.
Case "Automatic tool choice -> one or more function calls":
a. ✨ Add support for streaming the function calls (fixes Feature request: add support for streaming tool use #1883).
b. ✨ Make tool calling more robust by giving the LLM an explicit way to terminate the tool calls by wrapping them in a <function_calls></function_calls> block.
c. 🐛 Add missing ":" stop token to determine whether to continue with another tool call, which prevented parallel function calling (fixes chatml-function-calling chat format fails to generate multi calls to the same tool #1756).
d. ✨ Set temperature=0 to determine whether to continue with another tool call, similar to the initial decision on whether to call a tool.

lsorber · 2024-12-26T20:34:42Z

@abetlen The tests all pass, but the macOS ones were terminated after a timeout. I think this is because of a lack of CPU and or memory resources because the tests run fine on my macOS machine.

SubatomicPlanets · 2025-01-04T00:28:30Z

I would love to see this merged! Actually there are quite a lot of good pull requests here that i would like to see merged... But this one is top priority!

lsorber · 2025-01-05T14:53:51Z

Update: I rebased on the latest main and included a few tiny improvements to further improve tool calling robustness.

lsorber · 2025-01-12T15:07:04Z

Update: I rebased on the latest main and conditionally skipped the added tests on macOS when not enough resources are available to run them.

LenBanana · 2025-01-31T13:32:50Z

Worked well for me, would you mind rebasing to the latest commit to allow for tool streaming with Qwen models?
Thanks for your work!

conornash · 2025-03-04T11:55:35Z

Would love to see this merged - is there anything holding it up?

lsorber · 2025-03-14T13:38:57Z

@abetlen I rebased the PR on the latest upstream main and added a small commit to fix the returned logprobs format.

tsharp · 2026-01-26T03:15:47Z

I would also like to see this merge. 🧙‍♂️

XyLearningProgramming · 2026-02-23T05:18:36Z

I rebased this PR onto the latest upstream main (v0.3.16) and resolved the merge conflicts. Opened a new PR at #2129 in case it helps get this merged. All credit to @lsorber for the implementation.

lsorber mentioned this pull request Dec 26, 2024

Support parallel function calls with tool_choice #1503

Open

lsorber mentioned this pull request Dec 26, 2024

feat: add streaming tool use to llama-cpp-python superlinear-ai/raglite#71

Merged

lsorber force-pushed the main branch from d50770b to c9d6092 Compare January 5, 2025 14:51

lsorber force-pushed the main branch 5 times, most recently from b4f8fde to 17301de Compare January 12, 2025 14:48

lsorber added 4 commits March 14, 2025 14:18

feat: add streaming tool use

a08a754

fix: remove strict=True to support Python 3.9

f1da6e9

feat: improve tool use robustness

e9fa51e

test: skip if insufficient resources on macOS

a41d866

lsorber force-pushed the main branch from 17301de to a41d866 Compare March 14, 2025 13:24

fix: apply missing _convert_text_completion_logprobs_to_chat

72b0b51

XyLearningProgramming mentioned this pull request Feb 23, 2026

feat: add streaming tool use (rebased #1884 on latest main) #2129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add streaming tool use#1884

feat: add streaming tool use#1884
lsorber wants to merge 5 commits intoabetlen:mainfrom
lsorber:main

lsorber commented Dec 25, 2024 •

edited

Loading

Uh oh!

lsorber commented Dec 26, 2024

Uh oh!

SubatomicPlanets commented Jan 4, 2025

Uh oh!

lsorber commented Jan 5, 2025

Uh oh!

lsorber commented Jan 12, 2025

Uh oh!

LenBanana commented Jan 31, 2025

Uh oh!

conornash commented Mar 4, 2025

Uh oh!

lsorber commented Mar 14, 2025 •

edited

Loading

Uh oh!

tsharp commented Jan 26, 2026

Uh oh!

XyLearningProgramming commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Comments

Conversation

lsorber commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lsorber commented Dec 26, 2024

Uh oh!

SubatomicPlanets commented Jan 4, 2025

Uh oh!

lsorber commented Jan 5, 2025

Uh oh!

lsorber commented Jan 12, 2025

Uh oh!

LenBanana commented Jan 31, 2025

Uh oh!

conornash commented Mar 4, 2025

Uh oh!

lsorber commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tsharp commented Jan 26, 2026

Uh oh!

XyLearningProgramming commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lsorber commented Dec 25, 2024 •

edited

Loading

lsorber commented Mar 14, 2025 •

edited

Loading

XyLearningProgramming commented Feb 23, 2026 •

edited

Loading