Uncensored, abliterated & reasoning-distilled GGUFs — forged on 8×H200 SXM5 | 1.1TB VRAM
We take frontier open-weight models and make them actually useful:
- 🔓 Abliteration — Remove refusal training without destroying capability
- 🧠 Reasoning Distillation — Inject Claude Opus-level reasoning into open models via SFT
- 📦 Full GGUF Quant Ladder — Q2_K through BF16, every size for every setup
- 🔥 Real Benchmarks — Not vibes, not "it feels smarter." Numbers.
All models are processed on bare metal:
| Component | Spec |
|---|---|
| GPUs | 8× NVIDIA H200 SXM5 (141GB each) |
| Total VRAM | 1.1TB |
| Storage | 35TB NVMe RAID0 |
| Inference | Full BF16, no compromises |
We don't rent A100s for 4 hours and pray. We run the full pipeline on dedicated iron.
| Model | Size | Type | Downloads | Link |
|---|---|---|---|---|
| Mistral-Small-4-119B-Uncensored-GGUF | 119B | Abliterated, 7 quants | 🆕 NEW | HuggingFace |
| Model | Type | ETA |
|---|---|---|
| Qwen3.5-122B-A10B-Claude-Opus-Reasoning-Uncensored-GGUF | Reasoning distilled + abliterated | Training now (~18h) |
| Nemotron-3-Super-120B-Uncensored-GGUF | Abliterated | Next |
We have 1.1TB VRAM. We don't use optimization hacks designed for VRAM-poor setups. Raw transformers + peft + trl. Native SDPA. Direct llama.cpp. Every abstraction layer is a failure point we don't need.
Broken dependency? Diagnose once, route around it, ship. We don't spend 5 attempts patching someone else's bug when the lower-level tool works fine.
Corrupt output = fix or rebuild. Never ship broken artifacts. Every GGUF is magic-byte verified before upload.
Scout trending model on HF
→ Download full weights to /data (35TB NVMe)
→ Abliterate on 8×H200 (full BF16, extract+remove refusal direction)
→ Optional: SFT with reasoning datasets (Claude Opus distillation)
→ Convert to GGUF (llama.cpp convert_hf_to_gguf.py)
→ Quantize full ladder: Q2_K, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, BF16
→ Verify all artifacts (magic bytes, tensor counts)
→ Benchmark (llama-bench, perplexity where arch supports it)
→ Upload to HuggingFace + GitHub release
→ Post to r/LocalLLaMA
If our models are useful to you:
☕ Buy Me A Coffee — every coffee funds more uncensored models
Individual model licenses follow their base model's license (Apache 2.0, Llama Community, etc). Pipeline code in this repo is MIT.
⚡ Forged on 8×H200 SXM5 | 1.1TB VRAM