LLM Serving Infrastructure / Rust + CUDA / KV Cache Systems
I build the systems path between model weights and production tokens.
Inference runtime, decode fast paths, KV cache movement, GPU offload, SSD tiering, RDMA transport, NUMA-aware memory, and vLLM/SGLang/Mooncake integration.
I work on the serving substrate for large language models: the layer where CUDA kernels, Rust runtimes, KV cache systems, RDMA transport, and production schedulers meet.
My public work is concentrated in one direction: make LLM serving faster, more observable, and more predictable when the bottleneck is no longer just the model, but memory movement, cache layout, GPU/CPU coordination, and distributed serving behavior.
| System | What I push on |
|---|---|
| pegainfer | Pure Rust + CUDA inference runtime, Kimi/DeepSeek/Qwen decode paths, PPLX EP, CuTeDSL/cuBLAS prefill kernels, benchmark gates, nsys profiling |
| PegaFlow | KV cache storage for vLLM/SGLang, GPU offloading, SSD caching, RDMA QPs, pinned memory, NUMA placement, cache metrics, vLLM E2E gates |
| Mooncake | Store/transfer engine work, client metrics, RDMA device setup, NUMA binding, SGLang HiCache documentation and integration paths |
| LMCache | Mooncake connector performance, zero-copy get/put, NUMA-aware operations, vLLM scheduler/cache behavior |
| SGLang | HiCache/Mooncake integration, NUMA detection, cache prefetch fixes, serving-path reliability |
| vLLM ecosystem | Scheduler/cache issues, router fixes, connector behavior, large-scale serving ergonomics |
- Rust inference runtimes and CUDA-backed model execution
- Decode hot paths for Kimi, DeepSeek, and Qwen-style serving workloads
- KV cache transport across GPU memory, CPU pinned memory, SSD, and RDMA
- NUMA-aware allocation, pinned pool startup, CUDA IPC, and long-tail latency control
- vLLM/SGLang connector behavior under real cache pressure
- Benchmarking, profiling, CI gates, and release paths for serving infrastructure
Rust / CUDA / C++ / Python / RDMA / vLLM / SGLang / Mooncake / LMCache / PegaFlow
- LinkedIn: JinYan Su
- Blog: susun-blog.com
- Zhihu: yixie-gu-zhou-6-9




