Misc. bug: Higher VRAM usage after b8738

### Name and Version

b8738 and later

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

libllama (core library), llama-server

### Command line

```shell
./llama-server \                                                                                                                                                                            
    --alias local-ai \                                                                                                                                                                        
    -m /models/google_gemma-4-31B-it-Q4_K_S.gguf \                                                                                                                                            
    --host 0.0.0.0 \                                                                                                                                                                          
    --port 8080 \                                                                                                                                                                             
    -np 1 \                                                                                                                                                                                   
    -ngl 99 \                                                                                                                                                                                 
    -c 131072 \                                                                                                                                                                               
    -n 16000 \                                                                                                                                                                                
    --mmproj /models/mmproj-google_gemma-4-31B-it-bf16.gguf \                                                                                                                                 
    --cache-type-k q4_0 \                                                                                                                                                                     
    --cache-type-v q4_0 \                                                                                                                                                                     
    --cache-ram 2048 \                                                                                                                                                                        
    -ctxcp 2 \                                                                                                                                                                                
    --flash-attn on \                                                                                                                                                                         
    --jinja \                                                                                                                                                                                 
    --chat-template-file /templates/google-gemma-4-31B-it-                                                                                                                                    
  interleaved.jinja \                                                                                                                                                                         
    --no-prefill-assistant \                                                                                                                                                                  
    --chat-template-kwargs '{"enable_thinking":true}' \                                                                                                                                       
    --temp 1.0 \                                                                                                                                                                              
    --top-p 0.95 \                                                                                                                                                                            
    --top-k 64 \                                                                                                                                                                              
    --min-p 0.0 \                                                                                                                                                                             
    --presence-penalty 1.5 \                                                                                                                                                                  
    --repeat-penalty 1.0 \                                                                                                                                                                    
    --metrics
```

### Problem description & steps to reproduce

On a single RTX 3090, pre b8738 has lower VRAM idle and under load than at b8738+.
Pre b8738 idle VRAM is 22726mb, while in b8738+ idle VRAM increases to 23172mb.
A stress test with tuned full context works pre b8738, and fails on b8738+.

According to investigation by AI, the bad state is that ggml_cuda_init() initiatilzes NCCL communicators even on single GPU runs, which uses uneccessary VRAM. This should be fixed so it only initializes NCCL when there are more than one GPU.

Reproduce by running pre b8738 and measure idle VRAM, and then do the same on b8738+. At least with a single RTX 3090 on this setup.

### First Bad Commit

d6f3030047f85a98b009189e76f441fe818ea44db8738

### Relevant log output

Pre b8738 idle VRAM is 22726mb, while in b8738+ idle VRAM increases to 23172mb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Higher VRAM usage after b8738 #21759

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Higher VRAM usage after b8738 #21759

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions