llamdrop 🦙

Run AI on any device. No PC. No subscription. No struggle.

What is llamdrop?

llamdrop is a free, open-source tool that lets anyone run a local AI model on whatever device they own — an Android phone, an old laptop, a Raspberry Pi, a budget PC, even a gaming console running Linux.

It reads your hardware automatically, detects your exact chip, RAM, GPU, and platform, then finds AI models that will actually work on your specs, downloads the right quantization, and runs it. You don't need to know what quantization means. You don't need to read any documentation. You just run it.

llamdrop will always be completely free. It cannot be sold. Ever. That's not a promise — it's written into the license (GPL v3).

Who is this for?

This project was born from a real experience — spending hours trying to run local AI on a phone with no PC, no budget, and no guidance. Dozens of crashes, incompatible models, RAM errors with no explanation.

llamdrop is for anyone on low-end or budget hardware who keeps getting left out:

📱 Phone users — Android via Termux, no PC needed
💻 Old laptop owners — that 2012 laptop collecting dust can run AI
🍓 Raspberry Pi / SBC users — Pi 4, Pi 5, Orange Pi, etc.
🎮 Console / embedded Linux users — if it runs Linux, llamdrop runs on it
🪟 Windows users — native PowerShell installer, no WSL required
🍎 macOS users — Apple Silicon detected, Ollama backend auto-configured
🌍 Users in regions where $20/month is not a small amount
🧑‍🎓 Students and self-learners wanting to experiment with AI for free
🔧 Developers and tinkerers who want to test local AI on constrained hardware

If you've ever given up trying to run local AI because it was too complicated, crashed too many times, or cost too much — this is for you.

Quick Install

Android (Termux) / Linux / Raspberry Pi / macOS:

curl -sL https://raw.githubusercontent.com/DeVenLucaz/llamdrop/main/install.sh | bash

Windows (PowerShell, run as Administrator):

irm https://raw.githubusercontent.com/DeVenLucaz/llamdrop/main/install.ps1 | iex

Then run:

llamdrop

That's it. Two commands. No compilation. No configuration. No account needed.

Features

Device Intelligence

🔍 Full device profiling — reads RAM, CPU model, core layout (big.LITTLE aware), CPU flags (AVX2/AVX512/NEON), GPU vendor, storage, Android SoC/API level
🖥️ 7-tier classification — Micro / Low / Low-Mid / Mid / High / Desktop / Workstation — auto-configures everything per tier
🧠 Backend auto-selection — picks the correct backend for every platform×GPU combination: Termux pkg, CUDA, ROCm, Vulkan, Metal/Ollama, IPEX-LLM, or CPU
⚡ GPU acceleration — Vulkan for Adreno/Mali/AMD desktop, CUDA for NVIDIA, Metal via Ollama on Apple Silicon — with clear explanations for why GPU is or isn't active
🚫 Android GPU safety — never forces GPU on Android (Mali Vulkan is slower than CPU; Adreno crashes). CPU-only, no guessing.
👋 First-launch Device Profile — shows detected specs card with tier, backend decision, runtime flags, and model recommendations. Runs once.

Model Browser & Download

📋 Smart model browser — two modes:
- ✅ Verified catalog — 41 curated models confirmed working across all device tiers, filtered automatically to show only what fits your hardware
- 🔎 Live HuggingFace search — search any GGUF model with live RAM estimates
⬇️ Resilient downloader — auto-resumes on connection drops, retries automatically, verifies via SHA-256 checksum
🎯 Smart quantization — picks the best Q4/Q2/Q5/IQ variant based on your live RAM at download time
🧩 IQ quant support — IQ3_M and IQ2_M variants for more models — better quality than Q2_K at same RAM. Vulkan auto-disabled for IQ quants (incompatible).
📊 Benchmark scores — tokens/second recorded per model (rolling average, last 5 runs), shown in browser as ⚡ X t/s
🗂️ Cancelled download cleanup — partial files deleted immediately on cancel, never show as valid models

Chat & Inference

🤖 Ollama backend — auto-detected on Linux/desktop and macOS. Routes inference through Ollama HTTP API when running.
💬 Stable chat — automatic context trimming prevents out-of-memory crashes. Always preserves your first exchange — only the middle gets trimmed.
🦙 Live thinking indicator — animated spinner with non-blocking stdout while the model generates
🎯 Prompt format auto-detect — correct template per model family (ChatML, Llama3, Gemma, Phi3)
📂 File context — attach a file to your conversation before chatting
💾 Session save/load/delete — resume conversations where you left off, with auto-save every 5 exchanges (10 messages)
📤 Chat export — /export saves conversation to Downloads as markdown
🗂️ Conditional mmap — 15–30% lower peak RAM on internal storage models; external/sdcard keeps --no-mmap
🧹 Clean output pipeline — llama.cpp banner, duplicate responses, timing stats, and format tags (ChatML boundaries) are all stripped. What you see is only the model's response.

System & UX

⚠️ Live RAM monitor — colour-coded bar in UI (green/yellow/red), warns if memory gets critical during chat
🔋 Battery monitoring — shows charge %, per-inference battery drop, warns at configurable low threshold. Distinct icons per charge range.
📂 Phone-wide GGUF scanner — finds models you already have in Downloads, Documents, etc. Runs in background — UI stays responsive with a live counter.
🆙 Self-update — llamdrop update pulls latest version from GitHub (resolves correct install root)
🩺 Doctor — llamdrop doctor checks binary, libraries, RAM, storage, network, Python version, Termux permissions, and Ollama status
⚙️ Config file — override threads, context, temperature, system prompt, auto-save, battery warning threshold at ~/.llamdrop/config.json. Hot-reloads on external edits.
🌐 Multi-language UI — English, Hindi, Spanish, Portuguese, Arabic
🖥️ Curses TUI — keyboard-navigable menu with live RAM bar, battery line, llama.cpp + GPU status, and update notices
⚡ Fast startup — hardware detection runs exactly once at launch. Startup is noticeably faster on low-end devices.

Model Catalog

llamdrop uses a two-layer model system:

Layer 1 — Verified Catalog (`models.json`)

A community-maintained list of models confirmed to work on low-RAM devices. Every entry has been tested, has known RAM requirements, and is safe to download. No login or account required.

Layer 2 — Live HuggingFace Search

Search any model on HuggingFace directly from llamdrop. The tool estimates RAM requirements from file size and quantization type. Clearly marked as unverified — for experienced users who want to explore beyond the catalog.

Current verified catalog (41 models across 6 tiers):

Tier	Available RAM	Example Models
Micro	< 1 GB	SmolLM2 135M / 360M / 1.7B, Qwen2.5 0.5B, TinyLlama, Gemma 3 1B, Qwen3 1.7B
Low	1 – 3 GB	Qwen2.5 1.5B, Llama 3.2 1B, DeepSeek R1 1.5B, Gemma 2 2B, Phi-4 Mini, Qwen3 4B, SmolLM3 3B
Low-Mid	3 – 6 GB	Mistral 7B, Llama 3.1 8B, DeepSeek R1 7B, Qwen2.5 7B, Phi-3.5 Mini, Llama 3.2 3B
Mid	6 – 12 GB	Gemma 3 12B, Qwen3 8B, Phi-4 14B, DeepSeek R1 14B, Mistral NeMo 12B
High	12 – 24 GB	Gemma 3 27B, Qwen3 32B, DeepSeek R1 32B, Qwen2.5 Coder 32B
Desktop	24 GB+	Llama 3.3 70B, Qwen2.5 72B, DeepSeek R1 70B

All verified models are free, open-source, and downloadable without login or account. The browser automatically hides models outside your device's tier — you only see what can actually run.

Usage

llamdrop              # Launch UI
llamdrop update       # Update to latest version
llamdrop doctor       # Check install health
llamdrop version      # Show version

Chat commands:

/help     — show commands
/export   — save conversation as markdown
/clear    — clear history
/ram      — show current RAM usage
/quit     — exit chat

Supported Platforms

Platform	Status	Notes
Android via Termux	🎯 Primary test platform	Built and tested here first
Linux laptop / desktop	✅ Fully supported	Any distro, x86_64 or ARM64
Raspberry Pi 4 / 5	✅ Fully supported	ARM64
macOS (Apple Silicon)	✅ Fully supported	Ollama backend, GPU_LAYERS=999
macOS (Intel)	✅ Fully supported	CPU backend
Windows (native)	✅ Fully supported	PowerShell installer, CUDA/Vulkan auto-detected
Old Windows PC (WSL2)	✅ Supported	Via Windows Subsystem for Linux
Chromebook (Linux mode)	🔄 Should work	ARM64 or x86_64
Orange Pi / SBC	🔄 Should work	ARM64 Linux
iOS	❌ Not supported	No proper terminal environment

Project Structure

llamdrop/
├── llamdrop.py          # Main entry point + CLI (update, doctor, version)
├── install.sh           # One-line installer (Linux/Android/macOS/WSL)
├── install.ps1          # Native Windows PowerShell installer
├── models.json          # Verified model catalog (41 models, 6 tiers)
├── CHANGELOG.md         # Full version history
├── modules/
│   ├── specs.py         # Full device profiling — DeviceProfile dataclass, tier, backend, flags
│   ├── device.py        # Hardware detection bridge + legacy compat
│   ├── browser.py       # Model browser — verified catalog + HF live search
│   ├── downloader.py    # Resilient downloader + GGUF phone scanner
│   ├── launcher.py      # llama.cpp wrapper + Vulkan + mmap + DeviceProfile-aware
│   ├── chat.py          # Chat loop + inference extraction + backend dispatch
│   ├── ram_monitor.py   # Live RAM tracking and display
│   ├── hf_search.py     # Live HuggingFace search
│   ├── i18n.py          # Multi-language UI strings (EN/HI/ES/PT/AR)
│   ├── updater.py       # Self-update + background catalog updater
│   ├── benchmarks.py    # Tokens/sec benchmark storage (rolling average, 5 runs)
│   ├── doctor.py        # Install health checker + Ollama check
│   ├── config.py        # User config file with mtime-aware hot-reload
│   ├── battery.py       # Battery monitoring during inference
│   ├── filecontext.py   # File attachment for chat context
│   └── backends/
│       ├── __init__.py  # Backends package
│       └── ollama.py    # Ollama HTTP backend (auto-detected)
└── docs/
    ├── CONTRIBUTING.md  # How to contribute
    └── DEVICES.md       # Community device compatibility list

Roadmap

v0.7 — Done

Chip-aware threads — 30+ chips mapped to actual big core count
Fixed context thresholds — 2048–8192 tokens based on device class
Device class detection — ultra_low / low / mid / high / desktop
First-launch welcome screen — detected specs + model recommendations
Ollama backend — auto-detected on Linux/desktop, HTTP API routing
IQ quant support — IQ3_M/IQ2_M, Vulkan auto-disabled
Conditional mmap — 15–30% RAM saving on internal storage models
Clean inference extraction — _run_inference() / _dispatch_inference()
25 models in catalog

v0.8.5 — Done

v0.9.0 — Current

v1.0 — Planned

Web-based model catalog (GitHub Pages)
Community device profile submissions
/doc command — document chat with chunking (no vector DB needed)
llamdrop server mode — run on phone, access from browser on WiFi
Streaming tokens via Ollama backend

Contributing

You don't need to be a developer to contribute:

📲 Test a model on your device → open a PR to update models.json
🌐 Translate the UI into your language
📝 Write a setup guide for your specific device
🐛 Report a crash via GitHub Issues
⭐ Star this repo — it helps others find it when they need it most

See CONTRIBUTING.md for full details.

License

GNU General Public License v3.0 — see LICENSE

In plain language:

✅ Free to use forever
✅ Free to modify and share
❌ Cannot be sold
❌ Cannot be made closed-source
❌ Cannot be put behind a paywall

llamdrop will always be free. That is non-negotiable.

The Story

This project started because one vibe-coder spent hours trying to run local AI on an Oppo F19 Pro+ with no PC and no budget. Dozens of crashes. Models that were incompatible. RAM errors with no explanation. When it finally worked — with a tiny 1.5B model running in Termux — the thought was: nobody should have to go through all of that just to get started.

llamdrop is the tool that should have existed already.

Built by @DeVenLucaz and contributors. If llamdrop helped you, star the repo and share it with someone who needs it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llamdrop 🦙

What is llamdrop?

Who is this for?

Quick Install

Features

Device Intelligence

Model Browser & Download

Chat & Inference

System & UX

Model Catalog

Layer 1 — Verified Catalog (`models.json`)

Layer 2 — Live HuggingFace Search

Usage

Supported Platforms

Project Structure

Roadmap

v0.7 — Done

v0.8.5 — Done

v0.9.0 — Current

v1.0 — Planned

Contributing

License

The Story

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github		.github
docs		docs
modules		modules
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
bug_report.yml		bug_report.yml
install.ps1		install.ps1
install.sh		install.sh
llamdrop.py		llamdrop.py
models.json		models.json

Folders and files

Latest commit

History

Repository files navigation

llamdrop 🦙

What is llamdrop?

Who is this for?

Quick Install

Features

Device Intelligence

Model Browser & Download

Chat & Inference

System & UX

Model Catalog

Layer 1 — Verified Catalog (models.json)

Layer 2 — Live HuggingFace Search

Usage

Supported Platforms

Project Structure

Roadmap

v0.7 — Done

v0.8.5 — Done

v0.9.0 — Current

v1.0 — Planned

Contributing

License

The Story

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Layer 1 — Verified Catalog (`models.json`)

Packages