Skip to content

Commit bf4db9e

Browse files
unamedkrclaude
andcommitted
Improve first-run experience + community standards
User experience: - Fix --help: "safetensors" → "model.gguf" (P0 first impression) - Add --chat / -c flag: auto-wrap prompt with model chat template (Gemma: <start_of_turn>, Llama 3: <|start_header_id|>) - Users can now just: ./quant model.gguf -c -p "Hello" -n 50 Community standards (GitHub checklist): - CODE_OF_CONDUCT.md (Contributor Covenant 2.1) - SECURITY.md (vulnerability reporting, scope) - .github/ISSUE_TEMPLATE/bug_report.md - .github/ISSUE_TEMPLATE/feature_request.md - .github/PULL_REQUEST_TEMPLATE.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent cf37b4d commit bf4db9e

6 files changed

Lines changed: 148 additions & 2 deletions

File tree

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
name: Bug Report
3+
about: Something isn't working as expected
4+
title: ''
5+
labels: bug
6+
assignees: ''
7+
---
8+
9+
**Model & Config**
10+
- Model: (e.g., Llama-3.2-3B-Instruct-Q8_0.gguf)
11+
- KV compression: (e.g., `-k uniform_4b -v q4`)
12+
- Platform: (e.g., macOS M1 Pro 16GB)
13+
14+
**What happened?**
15+
A clear description of the bug.
16+
17+
**Expected behavior**
18+
What you expected to happen.
19+
20+
**Steps to reproduce**
21+
```bash
22+
./build/quant model.gguf -p "..." -n 50
23+
```
24+
25+
**Output**
26+
```
27+
(paste output here)
28+
```
29+
30+
**Build info**
31+
```bash
32+
git log --oneline -1
33+
cmake --build build 2>&1 | grep -c "warning:"
34+
```
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
name: Feature Request
3+
about: Suggest an idea for quant.cpp
4+
title: ''
5+
labels: enhancement
6+
assignees: ''
7+
---
8+
9+
**What problem does this solve?**
10+
A clear description of the problem or use case.
11+
12+
**Proposed solution**
13+
How you think it should work.
14+
15+
**Alternatives considered**
16+
Any alternative approaches you've thought about.
17+
18+
**Additional context**
19+
Links, references, or examples from other projects.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## What does this PR do?
2+
3+
(Brief description)
4+
5+
## Checklist
6+
7+
- [ ] `cmake --build build` — zero warnings
8+
- [ ] `ctest --test-dir build` — 34/34 pass
9+
- [ ] No files modified in `refs/`
10+
- [ ] README updated (if user-facing change)
11+
12+
## Test plan
13+
14+
How did you verify this works?
15+
16+
```bash
17+
./build/quant model.gguf -p "test" -n 10
18+
```

CODE_OF_CONDUCT.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Code of Conduct
2+
3+
## Our Pledge
4+
5+
We are committed to making participation in this project a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
6+
7+
## Our Standards
8+
9+
**Positive behavior:**
10+
- Using welcoming and inclusive language
11+
- Being respectful of differing viewpoints
12+
- Gracefully accepting constructive criticism
13+
- Focusing on what is best for the community
14+
15+
**Unacceptable behavior:**
16+
- Trolling, insulting comments, and personal attacks
17+
- Harassment in any form
18+
- Publishing others' private information without permission
19+
20+
## Enforcement
21+
22+
Instances of abusive behavior may be reported to hi@quantumai.kr. All complaints will be reviewed and investigated.
23+
24+
## Attribution
25+
26+
This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), version 2.1.

SECURITY.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Security Policy
2+
3+
## Supported Versions
4+
5+
| Version | Supported |
6+
|---------|-----------|
7+
| 0.5.x ||
8+
| < 0.5 ||
9+
10+
## Reporting a Vulnerability
11+
12+
If you discover a security vulnerability, please report it responsibly:
13+
14+
1. **Do NOT** open a public GitHub issue
15+
2. Email **hi@quantumai.kr** with details
16+
3. Include steps to reproduce if possible
17+
4. We will respond within 48 hours
18+
19+
## Scope
20+
21+
quant.cpp processes untrusted model files (GGUF). Known attack surfaces:
22+
- GGUF parser (src/engine/tq_gguf.c) — malformed headers, oversized tensors
23+
- Tokenizer (src/engine/tq_tokenizer.c) — malformed vocab data
24+
- mmap handling — file size validation
25+
26+
We take buffer overflows and memory corruption seriously.

tools/quant.c

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
* quant — Minimal C inference engine. Zero dependencies.
33
*
44
* Usage:
5-
* quant <model.safetensors> [options]
5+
* quant <model.gguf> [options]
66
*
77
* Options:
88
* -t <tokenizer> Path to tokenizer binary file
@@ -95,7 +95,7 @@ static void print_version(void) {
9595

9696
static void print_usage(const char* prog) {
9797
fprintf(stderr, "quant — Minimal C inference engine. Zero dependencies.\n");
98-
fprintf(stderr, "Usage: %s <model.safetensors> [options]\n\n", prog);
98+
fprintf(stderr, "Usage: %s <model.gguf> [options]\n\n", prog);
9999
fprintf(stderr, "Options:\n");
100100
fprintf(stderr, " -t <tokenizer> Tokenizer binary file\n");
101101
fprintf(stderr, " -p <prompt> Input prompt (default: \"Hello\")\n");
@@ -109,6 +109,7 @@ static void print_usage(const char* prog) {
109109
fprintf(stderr, " -q <type> Quantize weights: q2 (2-bit Lloyd-Max, ~12x reduction),\n");
110110
fprintf(stderr, " q4 (4-bit, ~6x reduction, default),\n");
111111
fprintf(stderr, " q8 (int8, ~3.5x reduction), or none (FP32)\n");
112+
fprintf(stderr, " -c, --chat Auto-wrap prompt with model chat template\n");
112113
fprintf(stderr, " --info Print model info and exit\n");
113114
fprintf(stderr, " -M, --memory Print KV cache memory stats after generation\n");
114115
fprintf(stderr, " --profile Profile forward pass timing (matmul/recurrent/moe/conv/attn)\n");
@@ -159,6 +160,7 @@ int main(int argc, char** argv) {
159160
int delta_iframe_int = 0; /* I-frame interval for delta KV (0 = auto = 64) */
160161
int k_highres_window = 0; /* age-based: recent N keys at FP32, rest at 2-bit */
161162
int json_output = 0; /* 1 = JSON output for --ppl */
163+
int chat_mode = 0; /* 1 = auto-wrap prompt with chat template */
162164

163165
for (int i = 1; i < argc; i++) {
164166
if (argv[i][0] != '-') {
@@ -248,6 +250,8 @@ int main(int argc, char** argv) {
248250
return 0;
249251
} else if (strcmp(argv[i], "--json") == 0) {
250252
json_output = 1;
253+
} else if (strcmp(argv[i], "--chat") == 0 || strcmp(argv[i], "-c") == 0) {
254+
chat_mode = 1;
251255
} else if (strcmp(argv[i], "-h") == 0 || strcmp(argv[i], "--help") == 0) {
252256
print_usage(argv[0]);
253257
return 0;
@@ -1074,6 +1078,25 @@ int main(int argc, char** argv) {
10741078
return 0;
10751079
}
10761080

1081+
/* Auto-wrap prompt with chat template when --chat is used */
1082+
char chat_prompt[8192];
1083+
if (chat_mode) {
1084+
tq_model_config_t* mc = &model->config;
1085+
if (mc->model_type == 1) {
1086+
/* Gemma 3/4: <start_of_turn>user\n...\n<end_of_turn>\n<start_of_turn>model\n */
1087+
snprintf(chat_prompt, sizeof(chat_prompt),
1088+
"<start_of_turn>user\n%s<end_of_turn>\n<start_of_turn>model\n", prompt);
1089+
} else if (strstr(prompt, "<|start_header_id|>") == NULL) {
1090+
/* Llama 3 / generic: wrap if not already wrapped */
1091+
snprintf(chat_prompt, sizeof(chat_prompt),
1092+
"<|start_header_id|>user<|end_header_id|>\n\n%s<|eot_id|>"
1093+
"<|start_header_id|>assistant<|end_header_id|>\n\n", prompt);
1094+
} else {
1095+
snprintf(chat_prompt, sizeof(chat_prompt), "%s", prompt);
1096+
}
1097+
prompt = chat_prompt;
1098+
}
1099+
10771100
/* Configure generation */
10781101
tq_gen_config_t config = tq_default_gen_config();
10791102
config.temperature = temperature;

0 commit comments

Comments
 (0)