Skip to content
View Zlatanwic's full-sized avatar
  • Tongji University
  • Shanghai

Block or report Zlatanwic

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Zlatanwic/README.md

Kuo Li

Undergraduate @ Tongji University · MLSys Fullstack · LLM Inference · Operating Systems

Email · GitHub · personal site


About

I'm an undergraduate at Tongji University interested in the intersection of systems programming and AI infrastructure.

My recent work focuses on LLM inference optimization, KV cache / paged attention, operating systems, and memory-efficient systems design.


Selected Work

  • SieveKV — semantics-aware KV cache eviction for long-context LLM inference
  • Paged KV Cache CUDA Kernels — fused CUDA kernels for efficient LLM decoding
  • NovaOS — a Rust-based POSIX-compatible kernel for RISC-V64
  • Distributed Semantic Retrieval System — Chord-based distributed dense retrieval and RAG pipeline

Tech Stack

Tech Stack


GitHub Stats


Honors

  • National First Prize — Global Campus AI Algorithm Challenge
  • International Silver Medal — iGEM

Building systems software for efficient AI.

Pinned Loading

  1. Kong-Debugger Kong-Debugger Public

    “空”--调试器(Kong Debugger),一个用rust语言重写gdb(GNU Debugger)的轻量级项目,注重内存安全、并发安全和性能,有ai赋能

    Rust 1

  2. Fin-RAG Fin-RAG Public

    一个基于混合双索引构建本地知识库的RAG技术金融问答系统

    Python 2

  3. Wechat-Read-MCP-in-Rust Wechat-Read-MCP-in-Rust Public

    一个用rust写的微信公众号抓取MCP,实现了浏览器抓取的反爬绕过机制

    Rust 12 3

  4. yzfly/Awesome-MCP-ZH yzfly/Awesome-MCP-ZH Public

    MCP 资源精选, MCP指南,Claude MCP,MCP Servers, MCP Clients

    7k 538

  5. Fused-Kernel-for-Paged-attention Fused-Kernel-for-Paged-attention Public

    面向大模型长上下文解码场景,实现并分析 paged KV cache 的 block-gather CUDA kernel,验证 fused attention 对减少中间显存流量和提升 decode 吞吐的效果。

    Python

  6. SJTU-IPADS/SkVM SJTU-IPADS/SkVM Public

    The Language Virtual Machine for Agent Skills

    TypeScript 430 39