Conversation
JohannesGaessler
left a comment
There was a problem hiding this comment.
This would work from a llama.cpp perspective but I'm not convinced it's the right way to handle this from a ggml perspective. In my opinion we should attach a ggml_guid and write the code around that. With a GUID the contract for "user code" would be "if you change the graph you must change the GUID", it's not clear to me what the correct way to use a reused flag would be.
|
|
||
| if (cgraph->version != 0 && | ||
| cgraph->version == graph->last_graph_version && | ||
| (int)graph->node_props.size() == cgraph->n_nodes) { |
There was a problem hiding this comment.
Will there ever be a case where versions match, but node count doesn't?
If not, maybe we can make this an assert?
There was a problem hiding this comment.
Yes will add an assert, it should not happen
|
I added a version number, currently we don't have a guid generator in master. If the current version is acceptable, I can add a guid generator however I think a simple static version counter is also fine imo. The idea of versioning that anything that mutates the cgraph should have increment the version number. |
| } | ||
|
|
||
| for (int i = 0; i < sched->n_splits; i++) { | ||
| sched->splits[i].graph.version = graph->version; |
There was a problem hiding this comment.
Should all splits really be the same uuid?
There was a problem hiding this comment.
I don't see any additional benefit of having a per split uuid as of now
There was a problem hiding this comment.
I think you're right though, it makes sense to have a per split uuid
| void ggml_print_backtrace(void); | ||
|
|
||
| static inline uint64_t ggml_graph_next_version(void) { | ||
| static uint64_t counter = 0; |
Overview
Add
reusedmember variable toggml_cgraphso backends can take advantage of the graph reuse functionality. Currently when graph_reuse in invoked, the CUDA backend still does the props change check to figure out if the graph has changed or not, where in factgraph_reuse(to my understanding) guarantees this to be true. This helps bypass a mildly expensive O(n) check.Additional information
Testing: I tested various combinations like
--n-cpu-moe,-nkvoand-ngland verified it works. Additional testing would be welcome.Results on a 5090:
Requirements