You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Qwen owns everything local. Llama 8B scored 50/102 vs Qwen 9B at 72/103. Hermes 70B couldn't even complete a run. All fine-tuning should be on Qwen architecture.
27B MTP is 1 point behind frontier. At 86/103 with 70 tok/s local, it's within striking distance of GPT-5.4 (88/102). The gap is creative (21 vs 23) and summarization (19 vs 21).
Current Model Inventory (/mnt/models — 415GB free)
Production LLMs
Model
Size
tok/s
Quick Score
Config
Qwen 27B INT4
29GB
53/70 MTP
84-86/103
qwen-27b-int4 / -mtp
Qwen 35B MoE
67GB
171
76/103
qwen-35b
Qwen 9B
19GB
92/112 MTP
72/103
qwen-9b / -mtp
Qwen 4B INT4
3.8GB
297
56/103
qwen-4b-int4
Qwen 122B INT4
74GB
~30 (TP=2)
—
qwen-122b-int4
Training / Fine-tune Bases
Model
Size
Purpose
Qwen 4B BF16
8.8GB
LoRA base
Qwen 4B Base
8.8GB
Pretraining
Qwen 2B + Base
8.6GB
Training experiments (5/5 reasoning at 2B!)
Qwen 0.8B + Base
3.4GB
Training experiments
Llama 3.1-8B AWQ
5GB
Eval baseline (poor: 50/102)
Creative / Experimental
Model
Size
Notes
Cydonia 24B
44GB
Uncensored Mistral, tool calling broken, holding for manual creative testing