ggml: add graph_reused by am17an · Pull Request #21764 · ggml-org/llama.cpp

am17an · 2026-04-11T08:37:16Z

Overview

Add reused member variable to ggml_cgraph so backends can take advantage of the graph reuse functionality. Currently when graph_reuse in invoked, the CUDA backend still does the props change check to figure out if the graph has changed or not, where in fact graph_reuse (to my understanding) guarantees this to be true. This helps bypass a mildly expensive O(n) check.

Additional information

Testing: I tested various combinations like --n-cpu-moe, -nkvo and -ngl and verified it works. Additional testing would be welcome.

Results on a 5090:

Model	Test	t/s cuda_mul_fused	t/s ggml_graph_reuse	Speedup
gemma4 ?B Q4_0	tg128	219.43	231.50	1.06
gemma4 ?B Q4_0	tg128@d16384	183.23	191.91	1.05
gemma4 ?B Q4_0	tg128@d32768	175.15	182.58	1.04
gpt-oss 20B MXFP4 MoE	tg128	313.29	323.01	1.03
gpt-oss 20B MXFP4 MoE	tg128@d16384	279.59	288.05	1.03
gpt-oss 20B MXFP4 MoE	tg128@d32768	258.80	266.12	1.03
qwen35 27B Q4_K_M	tg128	65.04	66.61	1.02
qwen35 27B Q4_K_M	tg128@d16384	62.45	63.61	1.02
qwen35 27B Q4_K_M	tg128@d32768	59.65	60.70	1.02
qwen35moe 35B.A3B Q4_K_S	tg128	194.52	206.56	1.06
qwen35moe 35B.A3B Q4_K_S	tg128@d16384	184.11	196.36	1.07
qwen35moe 35B.A3B Q4_K_S	tg128@d32768	175.95	187.60	1.07

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, for general understanding of the scheduler

JohannesGaessler

This would work from a llama.cpp perspective but I'm not convinced it's the right way to handle this from a ggml perspective. In my opinion we should attach a ggml_guid and write the code around that. With a GUID the contract for "user code" would be "if you change the graph you must change the GUID", it's not clear to me what the correct way to use a reused flag would be.

gaugarg-nv · 2026-04-11T14:11:48Z

ggml/src/ggml-cuda/ggml-cuda.cu


+    if (cgraph->version != 0 &&
+        cgraph->version == graph->last_graph_version &&
+        (int)graph->node_props.size() == cgraph->n_nodes) {


Will there ever be a case where versions match, but node count doesn't?
If not, maybe we can make this an assert?

Yes will add an assert, it should not happen

am17an · 2026-04-11T14:14:26Z

I added a version number, currently we don't have a guid generator in master. If the current version is acceptable, I can add a guid generator however I think a simple static version counter is also fine imo. The idea of versioning that anything that mutates the cgraph should have increment the version number.

jeffbolznv · 2026-04-11T14:19:48Z

ggml/src/ggml-backend.cpp

    }

+    for (int i = 0; i < sched->n_splits; i++) {
+        sched->splits[i].graph.version = graph->version;


Should all splits really be the same uuid?

I don't see any additional benefit of having a per split uuid as of now

I think you're right though, it makes sense to have a per split uuid

jeffbolznv · 2026-04-11T14:19:59Z

ggml/src/ggml-impl.h

 void ggml_print_backtrace(void);

+static inline uint64_t ggml_graph_next_version(void) {
+    static uint64_t counter = 0;


This should be atomic

ggml: add graph_reused

9b8a5da

am17an requested review from a team and ggerganov as code owners April 11, 2026 08:37

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 11, 2026

JohannesGaessler reviewed Apr 11, 2026

View reviewed changes

use versioning instead of reuse flag

1009c84

gaugarg-nv reviewed Apr 11, 2026

View reviewed changes

jeffbolznv reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: add graph_reused#21764

ggml: add graph_reused#21764
am17an wants to merge 2 commits intoggml-org:masterfrom
am17an:ggml_graph_reuse

am17an commented Apr 11, 2026

Uh oh!

JohannesGaessler left a comment

Uh oh!

gaugarg-nv Apr 11, 2026

Uh oh!

am17an Apr 11, 2026

Uh oh!

am17an commented Apr 11, 2026 •

edited

Loading

Uh oh!

jeffbolznv Apr 11, 2026

Uh oh!

am17an Apr 11, 2026

Uh oh!

am17an Apr 11, 2026

Uh oh!

jeffbolznv Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

am17an commented Apr 11, 2026

Overview

Additional information

Requirements

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

gaugarg-nv Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

am17an Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

am17an commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

am17an Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

am17an Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

am17an commented Apr 11, 2026 •

edited

Loading