Misc. bug: Prompt cache serialization crashes on second request with SSM model on Vulkan

## Name and Version
- llama.cpp commit: 2b2cd57
- Version: master branch (2026-04-11)

## Operating systems
- Linux

## Modules affected
- libllama (core library)
- llama-server

## Command line
```
llama-server -hf nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF
```

## Problem description & steps to reproduce
1. Run llama-server with NVIDIA Nemotron-3 Nano 4B (SSM-based model)
2. Send a chat completion request — works fine
3. Send a second chat completion request
4. Server crashes during prompt cache serialization

**Workaround:** `llama-server -hf nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF --cache-ram 0` (disabling prompt cache fixes it)

## Environment
- GPU: AMD Radeon RX 7900 XTX
- Vulkan: 1.4.341
- CPU: Ryzen 16-core

## Relevant log output
The crash occurs in `ggml-backend.cpp:348` during `llama_kv_cache::state_write_data`

Error: `GGML_ASSERT(tensor->data != NULL && "tensor not allocated") failed`

Backtrace shows the failure during prompt cache save after first response completes successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Prompt cache serialization crashes on second request with SSM model on Vulkan #21762

Name and Version

Operating systems

Modules affected

Command line

Problem description & steps to reproduce

Environment

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Prompt cache serialization crashes on second request with SSM model on Vulkan #21762

Description

Name and Version

Operating systems

Modules affected

Command line

Problem description & steps to reproduce

Environment

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions