JinYan Su xiaguan

JinYan Su

LLM Serving Infrastructure / Rust + CUDA / KV Cache Systems

I build the systems path between model weights and production tokens.

Inference runtime, decode fast paths, KV cache movement, GPU offload, SSD tiering, RDMA transport, NUMA-aware memory, and vLLM/SGLang/Mooncake integration.

What I am building

I work on the serving substrate for large language models: the layer where CUDA kernels, Rust runtimes, KV cache systems, RDMA transport, and production schedulers meet.

My public work is concentrated in one direction: make LLM serving faster, more observable, and more predictable when the bottleneck is no longer just the model, but memory movement, cache layout, GPU/CPU coordination, and distributed serving behavior.

Public signal

System	What I push on
pegainfer	Pure Rust + CUDA inference runtime, Kimi/DeepSeek/Qwen decode paths, PPLX EP, CuTeDSL/cuBLAS prefill kernels, benchmark gates, nsys profiling
PegaFlow	KV cache storage for vLLM/SGLang, GPU offloading, SSD caching, RDMA QPs, pinned memory, NUMA placement, cache metrics, vLLM E2E gates
Mooncake	Store/transfer engine work, client metrics, RDMA device setup, NUMA binding, SGLang HiCache documentation and integration paths
LMCache	Mooncake connector performance, zero-copy get/put, NUMA-aware operations, vLLM scheduler/cache behavior
SGLang	HiCache/Mooncake integration, NUMA detection, cache prefetch fixes, serving-path reliability
vLLM ecosystem	Scheduler/cache issues, router fixes, connector behavior, large-scale serving ergonomics

Where I go deep

Rust inference runtimes and CUDA-backed model execution
Decode hot paths for Kimi, DeepSeek, and Qwen-style serving workloads
KV cache transport across GPU memory, CPU pinned memory, SSD, and RDMA
NUMA-aware allocation, pinned pool startup, CUDA IPC, and long-tail latency control
vLLM/SGLang connector behavior under real cache pressure
Benchmarking, profiling, CI gates, and release paths for serving infrastructure

Current stack

Rust / CUDA / C++ / Python / RDMA / vLLM / SGLang / Mooncake / LMCache / PegaFlow

Contact

LinkedIn: JinYan Su
Blog: susun-blog.com
Zhihu: yixie-gu-zhou-6-9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JinYan Su xiaguan

Achievements