Improve first-run experience + community standards

unamedkr · claude · unamedkr · commit bf4db9e7fdb9 · 2026-04-05T23:49:42.000+09:00
User experience:
- Fix --help: "safetensors" → "model.gguf" (P0 first impression)
- Add --chat / -c flag: auto-wrap prompt with model chat template
  (Gemma: &lt;start_of_turn&gt;, Llama 3: &lt;|start_header_id|&gt;)
- Users can now just: ./quant model.gguf -c -p "Hello" -n 50

Community standards (GitHub checklist):
- CODE_OF_CONDUCT.md (Contributor Covenant 2.1)
- SECURITY.md (vulnerability reporting, scope)
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/ISSUE_TEMPLATE/feature_request.md
- .github/PULL_REQUEST_TEMPLATE.md

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,34 @@
+---
+name: Bug Report
+about: Something isn't working as expected
+title: ''
+labels: bug
+assignees: ''
+---
+
+**Model & Config**
+- Model: (e.g., Llama-3.2-3B-Instruct-Q8_0.gguf)
+- KV compression: (e.g., `-k uniform_4b -v q4`)
+- Platform: (e.g., macOS M1 Pro 16GB)
+
+**What happened?**
+A clear description of the bug.
+
+**Expected behavior**
+What you expected to happen.
+
+**Steps to reproduce**
+```bash
+./build/quant model.gguf -p "..." -n 50
+```
+
+**Output**
+```
+(paste output here)
+```
+
+**Build info**
+```bash
+git log --oneline -1
+cmake --build build 2>&1 | grep -c "warning:"
+```
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,19 @@
+---
+name: Feature Request
+about: Suggest an idea for quant.cpp
+title: ''
+labels: enhancement
+assignees: ''
+---
+
+**What problem does this solve?**
+A clear description of the problem or use case.
+
+**Proposed solution**
+How you think it should work.
+
+**Alternatives considered**
+Any alternative approaches you've thought about.
+
+**Additional context**
+Links, references, or examples from other projects.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,18 @@
+## What does this PR do?
+
+(Brief description)
+
+## Checklist
+
+- [ ] `cmake --build build` — zero warnings
+- [ ] `ctest --test-dir build` — 34/34 pass
+- [ ] No files modified in `refs/`
+- [ ] README updated (if user-facing change)
+
+## Test plan
+
+How did you verify this works?
+
+```bash
+./build/quant model.gguf -p "test" -n 10
+```
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,26 @@
+# Code of Conduct
+
+## Our Pledge
+
+We are committed to making participation in this project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+**Positive behavior:**
+- Using welcoming and inclusive language
+- Being respectful of differing viewpoints
+- Gracefully accepting constructive criticism
+- Focusing on what is best for the community
+
+**Unacceptable behavior:**
+- Trolling, insulting comments, and personal attacks
+- Harassment in any form
+- Publishing others' private information without permission
+
+## Enforcement
+
+Instances of abusive behavior may be reported to hi@quantumai.kr. All complaints will be reviewed and investigated.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), version 2.1.
diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,26 @@
+# Security Policy
+
+## Supported Versions
+
+| Version | Supported |
+|---------|-----------|
+| 0.5.x   | ✅        |
+| < 0.5   | ❌        |
+
+## Reporting a Vulnerability
+
+If you discover a security vulnerability, please report it responsibly:
+
+1. **Do NOT** open a public GitHub issue
+2. Email **hi@quantumai.kr** with details
+3. Include steps to reproduce if possible
+4. We will respond within 48 hours
+
+## Scope
+
+quant.cpp processes untrusted model files (GGUF). Known attack surfaces:
+- GGUF parser (src/engine/tq_gguf.c) — malformed headers, oversized tensors
+- Tokenizer (src/engine/tq_tokenizer.c) — malformed vocab data
+- mmap handling — file size validation
+
+We take buffer overflows and memory corruption seriously.
diff --git a/tools/quant.c b/tools/quant.c
@@ -2,7 +2,7 @@
  * quant — Minimal C inference engine. Zero dependencies.
  *
  * Usage:
- *   quant <model.safetensors> [options]
+ *   quant <model.gguf> [options]
  *
  * Options:
  *   -t <tokenizer>   Path to tokenizer binary file
@@ -95,7 +95,7 @@ static void print_version(void) {
 
 static void print_usage(const char* prog) {
     fprintf(stderr, "quant — Minimal C inference engine. Zero dependencies.\n");
-    fprintf(stderr, "Usage: %s <model.safetensors> [options]\n\n", prog);
+    fprintf(stderr, "Usage: %s <model.gguf> [options]\n\n", prog);
     fprintf(stderr, "Options:\n");
     fprintf(stderr, "  -t <tokenizer>   Tokenizer binary file\n");
     fprintf(stderr, "  -p <prompt>      Input prompt (default: \"Hello\")\n");
@@ -109,6 +109,7 @@ static void print_usage(const char* prog) {
     fprintf(stderr, "  -q <type>        Quantize weights: q2 (2-bit Lloyd-Max, ~12x reduction),\n");
     fprintf(stderr, "                   q4 (4-bit, ~6x reduction, default),\n");
     fprintf(stderr, "                   q8 (int8, ~3.5x reduction), or none (FP32)\n");
+    fprintf(stderr, "  -c, --chat       Auto-wrap prompt with model chat template\n");
     fprintf(stderr, "  --info           Print model info and exit\n");
     fprintf(stderr, "  -M, --memory     Print KV cache memory stats after generation\n");
     fprintf(stderr, "  --profile        Profile forward pass timing (matmul/recurrent/moe/conv/attn)\n");
@@ -159,6 +160,7 @@ int main(int argc, char** argv) {
     int delta_iframe_int = 0; /* I-frame interval for delta KV (0 = auto = 64) */
     int k_highres_window = 0; /* age-based: recent N keys at FP32, rest at 2-bit */
     int json_output = 0;     /* 1 = JSON output for --ppl */
+    int chat_mode = 0;       /* 1 = auto-wrap prompt with chat template */
 
     for (int i = 1; i < argc; i++) {
         if (argv[i][0] != '-') {
@@ -248,6 +250,8 @@ int main(int argc, char** argv) {
             return 0;
         } else if (strcmp(argv[i], "--json") == 0) {
             json_output = 1;
+        } else if (strcmp(argv[i], "--chat") == 0 || strcmp(argv[i], "-c") == 0) {
+            chat_mode = 1;
         } else if (strcmp(argv[i], "-h") == 0 || strcmp(argv[i], "--help") == 0) {
             print_usage(argv[0]);
             return 0;
@@ -1074,6 +1078,25 @@ int main(int argc, char** argv) {
         return 0;
     }
 
+    /* Auto-wrap prompt with chat template when --chat is used */
+    char chat_prompt[8192];
+    if (chat_mode) {
+        tq_model_config_t* mc = &model->config;
+        if (mc->model_type == 1) {
+            /* Gemma 3/4: <start_of_turn>user\n...\n<end_of_turn>\n<start_of_turn>model\n */
+            snprintf(chat_prompt, sizeof(chat_prompt),
+                "<start_of_turn>user\n%s<end_of_turn>\n<start_of_turn>model\n", prompt);
+        } else if (strstr(prompt, "<|start_header_id|>") == NULL) {
+            /* Llama 3 / generic: wrap if not already wrapped */
+            snprintf(chat_prompt, sizeof(chat_prompt),
+                "<|start_header_id|>user<|end_header_id|>\n\n%s<|eot_id|>"
+                "<|start_header_id|>assistant<|end_header_id|>\n\n", prompt);
+        } else {
+            snprintf(chat_prompt, sizeof(chat_prompt), "%s", prompt);
+        }
+        prompt = chat_prompt;
+    }
+
     /* Configure generation */
     tq_gen_config_t config = tq_default_gen_config();
     config.temperature = temperature;