Skip to content

Commit ab254ff

Browse files
unamedkrclaude
andcommitted
README: add 60-second quick start at top, enable Discussions
- "Get Started in 60 Seconds" section right after The Result (build → download → run → compress, copy-paste ready) - Links to docs, WASM demo, custom quantization guide - Lower Quick Start → Advanced Usage (delta, PPL, profile) - GitHub Discussions enabled for community Q&A - models/ added to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent bf4db9e commit ab254ff

2 files changed

Lines changed: 52 additions & 30 deletions

File tree

README.ko.md

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,26 @@ LLM 메모리의 병목은 모델 가중치가 아니라 **KV 캐시**입니다.
4949
| 8GB 노트북 | Llama 8B (Q4) | 16K 토큰 | **61K 토큰** | **3.8x** |
5050
| 24GB RTX 3090 | Llama 8B (Q4) | 147K 토큰 | **559K 토큰** | **3.8x** |
5151

52+
## 60초 시작 가이드
53+
5254
```bash
53-
# 명령어 하나. 끝.
54-
./quant model.gguf -p "hello" -k uniform_4b -v q4
55+
# 1. 빌드
56+
git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
57+
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
58+
59+
# 2. 모델 다운로드 (135MB 입문용)
60+
pip install huggingface_hub
61+
hf download bartowski/SmolLM2-135M-Instruct-GGUF SmolLM2-135M-Instruct-Q8_0.gguf --local-dir models/
62+
63+
# 3. 실행
64+
./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "안녕!" -j 4
65+
66+
# 4. KV 압축 (7배 긴 컨텍스트)
67+
./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "안녕!" -k uniform_4b -v q4
5568
```
5669

70+
> **[API 레퍼런스](docs/api.md)** · **[WASM 데모](https://quantumaikr.github.io/quant.cpp/)** · **[커스텀 양자화 가이드](docs/custom-quantization.md)**
71+
5772
---
5873

5974
## 비교
@@ -170,24 +185,20 @@ QK-norm이 적용된 모델은 key 벡터를 단위 구체로 정규화하여
170185

171186
---
172187

173-
## 빠른 시작
188+
## 고급 사용법
174189

175190
```bash
176-
git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
177-
cmake -B build -DCMAKE_BUILD_TYPE=Release
178-
cmake --build build -j$(nproc)
179-
180-
# 기본 추론
181-
./build/quant model.gguf -p "hello"
182-
183-
# KV 압축 (추천)
184-
./build/quant model.gguf -p "hello" -k uniform_4b -v q4
185-
186-
# Delta 압축 (최대 컨텍스트)
187-
./build/quant model.gguf -p "hello" -k uniform_3b -v q4 --delta
191+
# Delta 압축 (최대 컨텍스트, 8.5x)
192+
./build/quant model.gguf --chat -p "hello" -k uniform_3b -v q4 --delta
188193

189194
# PPL 벤치마크
190195
./build/quant model.gguf --ppl input.txt -k uniform_4b -v q4
196+
197+
# 모델 정보
198+
./build/quant model.gguf --info
199+
200+
# 성능 프로파일링
201+
./build/quant model.gguf --chat -p "hello" -n 50 --profile
191202
```
192203

193204
---

README.md

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,26 @@ LLM memory is dominated by the **KV cache**, not model weights. At 32K context,
4949
| 8GB Laptop | Llama 8B (Q4) | 16K tokens | **61K tokens** | **3.8x** |
5050
| 24GB RTX 3090 | Llama 8B (Q4) | 147K tokens | **559K tokens** | **3.8x** |
5151

52+
## Get Started in 60 Seconds
53+
5254
```bash
53-
# One command. That's it.
54-
./quant model.gguf -p "hello" -k uniform_4b -v q4
55+
# 1. Build
56+
git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
57+
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
58+
59+
# 2. Download a model (135MB starter)
60+
pip install huggingface_hub
61+
hf download bartowski/SmolLM2-135M-Instruct-GGUF SmolLM2-135M-Instruct-Q8_0.gguf --local-dir models/
62+
63+
# 3. Run
64+
./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "Hello!" -j 4
65+
66+
# 4. With KV compression (7x longer context)
67+
./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "Hello!" -k uniform_4b -v q4
5568
```
5669

70+
> **[Full API docs](docs/api.md)** · **[WASM demo](https://quantumaikr.github.io/quant.cpp/)** · **[Add your own KV type](docs/custom-quantization.md)**
71+
5772
---
5873

5974
## How It Compares
@@ -170,24 +185,20 @@ Models with QK-norm normalize keys to the unit sphere, creating extremely sparse
170185

171186
---
172187

173-
## Quick Start
188+
## Advanced Usage
174189

175190
```bash
176-
git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
177-
cmake -B build -DCMAKE_BUILD_TYPE=Release
178-
cmake --build build -j$(nproc)
179-
180-
# Basic inference
181-
./build/quant model.gguf -p "hello"
182-
183-
# With KV compression (recommended)
184-
./build/quant model.gguf -p "hello" -k uniform_4b -v q4
185-
186-
# Delta compression (maximum context)
187-
./build/quant model.gguf -p "hello" -k uniform_3b -v q4 --delta
191+
# Delta compression (maximum context, 8.5x)
192+
./build/quant model.gguf --chat -p "hello" -k uniform_3b -v q4 --delta
188193

189194
# Perplexity benchmark
190195
./build/quant model.gguf --ppl input.txt -k uniform_4b -v q4
196+
197+
# Model info
198+
./build/quant model.gguf --info
199+
200+
# Performance profiling
201+
./build/quant model.gguf --chat -p "hello" -n 50 --profile
191202
```
192203

193204
---

0 commit comments

Comments
 (0)