Bumblebee is a fully client-side chat app: React, Vite, and @huggingface/transformers run ONNX language models in the browser. There is no app backend, account system, or server-side inferenceβonly Hugging Face as the CDN for model weights. Chat history stays in memory for the current tab.
Use it as a small reference for local-first browser AI, Web Workers, streaming UI, and lightweight on-device generation.
Live demo: bumblebee.joshxfi.com
Source: Open source on GitHub Β· Author: @joshxfi
- Runs text generation in a Web Worker so inference does not block the UI thread
- Loads ONNX checkpoints from Hugging Face at runtime; the browser cache speeds repeat visits
- Model picker with models grouped by provider family
- Picks a device profile (
standardvsconstrained): constrained users get a lighter default, and models marked desktop-only are disabled in the UI to avoid unstable loads - Streams assistant output as markdown (Streamdown)
- Keeps the transcript ephemeral (in-memory only for the session)
- React 19 and Vite power the shell and chat UI
- Zustand holds chat and runtime state; Tailwind CSS styles the UI
@huggingface/transformersloads the selected repo viapipeline("text-generation", β¦)inside the worker
Weights and tokenizers are not bundled; they download on demand from Hugging Face, then reuse the browser cache when possible.
Bumblebee picks the starting model from getRecommendedModelId and getDeviceProfile. Constrained mode is used on typical mobile user agents, touch-capable Macs counted as touch-first, or when navigator.deviceMemory is available and β€ 4 GB.
- Standard (desktop-class) default: LFM2.5 350M β
onnx-community/LFM2.5-350M-ONNX - Constrained default: Falcon H1 Tiny 90M Instruct β
onnx-community/Falcon-H1-Tiny-90M-Instruct-ONNX
All checkpoints below are q4 ONNX builds from the onnx-community org. Desktop only means supportsMobile: false in configβthose entries are disabled when the device profile is constrained.
- SmolLM2 135M β
onnx-community/SmolLM2-135M-Instruct-ONNX-MHAβ mobile + desktop - SmolLM2 360M β
onnx-community/SmolLM2-360M-ONNXβ mobile + desktop
- Gemma 3 270M β
onnx-community/gemma-3-270m-it-ONNXβ mobile + desktop - Gemma 3 1B β
onnx-community/gemma-3-1b-it-ONNXβ desktop only
- Qwen2.5 0.5B β
onnx-community/Qwen2.5-0.5B-Instruct-ONNX-MHAβ mobile + desktop - Qwen3 0.6B β
onnx-community/Qwen3-0.6B-ONNXβ mobile + desktop
- Falcon H1 Tiny 90M β
onnx-community/Falcon-H1-Tiny-90M-Instruct-ONNXβ mobile + desktop - Falcon H1 Tiny Multilingual 100M β
onnx-community/Falcon-H1-Tiny-Multilingual-100M-Instruct-ONNXβ mobile + desktop
- LFM2.5 350M β
onnx-community/LFM2.5-350M-ONNXβ mobile + desktop - LFM2 350M β
onnx-community/LFM2-350M-ONNXβ mobile + desktop - LFM2 700M β
onnx-community/LFM2-700M-ONNXβ desktop only - LFM2 1.2B β
onnx-community/LFM2-1.2B-ONNXβ desktop only
- Llama 3.2 1B β
onnx-community/Llama-3.2-1B-Instruct-ONNXβ desktop only
- TinySwallow 1.5B β
onnx-community/TinySwallow-1.5B-Instruct-ONNXβ desktop only
- Bonsai 1.7B β
onnx-community/Bonsai-1.7B-ONNXβ desktop only
- Educational and experimentalβnot a production AI platform
- First run downloads tokenizer and weights; later visits depend on browser cache behavior
- Very low-memory hardware can still struggle even with small models
- Browser-based inference is not the same as a fully offline native desktop runtime
- Quality and coherence are limited by model size and quantization
bun install
bun run devbun run build
bun run test
bun run lint
bun run typecheck
bun run preview # local preview of production buildsrc/chat-app.tsxβ main chat UIsrc/workers/chat.worker.tsβ model load and generation in a workersrc/lib/chat-store.tsβ Zustand store and message/runtime orchestrationsrc/lib/chat-config.tsβ model catalog, defaults, device profile, generation presets
This project is licensed under the MIT License.
