forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 4
Pull requests: ngxson/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Mirror] anthropic: fix prefix caching
examples
ggml
Nvidia GPU
server
#98
opened Apr 12, 2026 by
ngxson
Owner
Loading…
[Mirror] model : refactor QKV into common build_qkv and create_tensor_qkv helpers
model
#94
opened Apr 1, 2026 by
ngxson
Owner
Loading…
[Mirror] server : refactor oai_parser_opt, move it to server_chat_params
examples
server
#83
opened Jan 19, 2026 by
ngxson
Owner
Loading…
[Mirror] server: fix memory reservations in populate_token_probs
examples
server
#81
opened Jan 19, 2026 by
ngxson
Owner
Loading…
[Mirror] server : fix router child env in containerized environments
examples
server
#75
opened Jan 5, 2026 by
ngxson
Owner
Loading…
Xsn/jinja vm
documentation
Improvements or additions to documentation
examples
python
script
server
testing
[Mirror] feat: Add model pinning feature to protect critical models from LRU eviction
examples
server
#70
opened Dec 25, 2025 by
ngxson
Owner
Loading…
[Mirror] server: (preset) add
unsafe-allow-api-override
examples
server
#68
opened Dec 23, 2025 by
ngxson
Owner
Loading…
[Mirror] mtmd: Add DeepSeekOCR Support
documentation
Improvements or additions to documentation
examples
ggml
model
Nvidia GPU
python
testing
#66
opened Dec 23, 2025 by
ngxson
Owner
Loading…
[Mirror] New quantization type: Q3_HIFI
Apple Metal
documentation
Improvements or additions to documentation
examples
ggml
Nvidia GPU
python
SYCL
testing
Vulkan
#65
opened Dec 22, 2025 by
ngxson
Owner
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-09.