README: add 60-second quick start at top, enable Discussions

unamedkr · claude · unamedkr · commit ab254ff89539 · 2026-04-06T00:21:20.000+09:00
- "Get Started in 60 Seconds" section right after The Result
  (build → download → run → compress, copy-paste ready)
- Links to docs, WASM demo, custom quantization guide
- Lower Quick Start → Advanced Usage (delta, PPL, profile)
- GitHub Discussions enabled for community Q&amp;A
- models/ added to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.ko.md b/README.ko.md
@@ -49,11 +49,26 @@ LLM 메모리의 병목은 모델 가중치가 아니라 **KV 캐시**입니다.
 | 8GB 노트북 | Llama 8B (Q4) | 16K 토큰 | **61K 토큰** | **3.8x** |
 | 24GB RTX 3090 | Llama 8B (Q4) | 147K 토큰 | **559K 토큰** | **3.8x** |
 
+## 60초 시작 가이드
+
 ```bash
-# 명령어 하나. 끝.
-./quant model.gguf -p "hello" -k uniform_4b -v q4
+# 1. 빌드
+git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
+cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
+
+# 2. 모델 다운로드 (135MB 입문용)
+pip install huggingface_hub
+hf download bartowski/SmolLM2-135M-Instruct-GGUF SmolLM2-135M-Instruct-Q8_0.gguf --local-dir models/
+
+# 3. 실행
+./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "안녕!" -j 4
+
+# 4. KV 압축 (7배 긴 컨텍스트)
+./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "안녕!" -k uniform_4b -v q4
 ```
 
+> **[API 레퍼런스](docs/api.md)** · **[WASM 데모](https://quantumaikr.github.io/quant.cpp/)** · **[커스텀 양자화 가이드](docs/custom-quantization.md)**
+
 ---
 
 ## 비교
@@ -170,24 +185,20 @@ QK-norm이 적용된 모델은 key 벡터를 단위 구체로 정규화하여 
 
 ---
 
-## 빠른 시작
+## 고급 사용법
 
 ```bash
-git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
-cmake -B build -DCMAKE_BUILD_TYPE=Release
-cmake --build build -j$(nproc)
-
-# 기본 추론
-./build/quant model.gguf -p "hello"
-
-# KV 압축 (추천)
-./build/quant model.gguf -p "hello" -k uniform_4b -v q4
-
-# Delta 압축 (최대 컨텍스트)
-./build/quant model.gguf -p "hello" -k uniform_3b -v q4 --delta
+# Delta 압축 (최대 컨텍스트, 8.5x)
+./build/quant model.gguf --chat -p "hello" -k uniform_3b -v q4 --delta
 
 # PPL 벤치마크
 ./build/quant model.gguf --ppl input.txt -k uniform_4b -v q4
+
+# 모델 정보
+./build/quant model.gguf --info
+
+# 성능 프로파일링
+./build/quant model.gguf --chat -p "hello" -n 50 --profile
 ```
 
 ---
diff --git a/README.md b/README.md
@@ -49,11 +49,26 @@ LLM memory is dominated by the **KV cache**, not model weights. At 32K context,
 | 8GB Laptop | Llama 8B (Q4) | 16K tokens | **61K tokens** | **3.8x** |
 | 24GB RTX 3090 | Llama 8B (Q4) | 147K tokens | **559K tokens** | **3.8x** |
 
+## Get Started in 60 Seconds
+
 ```bash
-# One command. That's it.
-./quant model.gguf -p "hello" -k uniform_4b -v q4
+# 1. Build
+git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
+cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
+
+# 2. Download a model (135MB starter)
+pip install huggingface_hub
+hf download bartowski/SmolLM2-135M-Instruct-GGUF SmolLM2-135M-Instruct-Q8_0.gguf --local-dir models/
+
+# 3. Run
+./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "Hello!" -j 4
+
+# 4. With KV compression (7x longer context)
+./build/quant models/SmolLM2-135M-Instruct-Q8_0.gguf --chat -p "Hello!" -k uniform_4b -v q4
 ```
 
+> **[Full API docs](docs/api.md)** · **[WASM demo](https://quantumaikr.github.io/quant.cpp/)** · **[Add your own KV type](docs/custom-quantization.md)**
+
 ---
 
 ## How It Compares
@@ -170,24 +185,20 @@ Models with QK-norm normalize keys to the unit sphere, creating extremely sparse
 
 ---
 
-## Quick Start
+## Advanced Usage
 
 ```bash
-git clone https://github.com/quantumaikr/quant.cpp && cd quant.cpp
-cmake -B build -DCMAKE_BUILD_TYPE=Release
-cmake --build build -j$(nproc)
-
-# Basic inference
-./build/quant model.gguf -p "hello"
-
-# With KV compression (recommended)
-./build/quant model.gguf -p "hello" -k uniform_4b -v q4
-
-# Delta compression (maximum context)
-./build/quant model.gguf -p "hello" -k uniform_3b -v q4 --delta
+# Delta compression (maximum context, 8.5x)
+./build/quant model.gguf --chat -p "hello" -k uniform_3b -v q4 --delta
 
 # Perplexity benchmark
 ./build/quant model.gguf --ppl input.txt -k uniform_4b -v q4
+
+# Model info
+./build/quant model.gguf --info
+
+# Performance profiling
+./build/quant model.gguf --chat -p "hello" -n 50 --profile
 ```
 
 ---