Audio similarity search that runs entirely in your browser. Drop in a sound file or type a description like "dark punchy kick", and SonicFind finds the most similar sounds from your sample library. No server, no uploads — all processing happens locally on your machine.
SonicFind uses a 3-step pipeline: Extract → Embed → Compare.
- Extract — Audio files are decoded via Web Audio API, resampled to 16kHz mono, and normalized to a standard Float32Array.
- Embed — The CLAP neural network (Contrastive Language-Audio Pretraining) converts audio into a 512-dimensional vector that captures timbral properties — what the sound sounds like, not what it's called.
- Compare — Cosine similarity ranks every indexed sample against the query. Results are sorted by similarity score.
CLAP also supports text-to-audio search: type "dark punchy kick" and find matching sounds without uploading anything.
| Component | Technology |
|---|---|
| Framework | React 19 + Vite 7 |
| ML Runtime | Transformers.js (ONNX Runtime WebAssembly) |
| Audio Model | CLAP (Xenova/clap-htsat-unfused, 161 MB) |
| Audio Decode | Web Audio API |
| Folder Access | File System Access API |
| Styling | Tailwind CSS v4 |
| Threading | Web Workers |
| Metric | Value |
|---|---|
| Single file embed (cold) | ~537 ms |
| Single file embed (warm) | ~133 ms |
| Batch indexing rate | ~450 samples/min |
| Audio decode | ~11 ms |
| Cosine search (2000+ vectors) | < 10 ms |
| Model init (cached) | ~4.4 s |
| Model init (first download) | ~31 s |
Benchmarked with 2188 WAV samples on a standard desktop. Indexing speed varies with file length and hardware.
| Feature | Chrome 86+ | Edge 86+ | Firefox | Safari |
|---|---|---|---|---|
| Audio search (drop file) | Yes | Yes | Yes | Yes |
| Text search | Yes | Yes | Yes | Yes |
| Demo library | Yes | Yes | Yes | Yes |
| Folder indexing | Yes | Yes | No | No |
Folder indexing requires the File System Access API (Chrome/Edge only). All other features work in any modern browser.
- Text search accuracy — Abstract adjectives ("soft", "warm") produce correct rankings but low confidence scores. Concrete terms ("kick", "hihat") work well.
- First-load model download — 161 MB on first visit. Cached after that.
- Folder indexing — Chrome/Edge only (File System Access API).
- No persistent storage — Indexed library is lost on page refresh. IndexedDB persistence planned for Phase 2.
- VectorStore scaling — Brute-force cosine similarity, fine up to ~5000 samples. HNSW indexing planned for Phase 2.
- Hybrid DSP+CLAP search — Re-rank CLAP results using spectral centroid, attack time, and zero crossing rate for better adjective-based queries
- HNSW indexing — Approximate nearest neighbor for 10,000+ sample libraries
- Persistent storage — Save/load indexed libraries via IndexedDB
- Binary serialization — Replace JSON-based VectorStore with ArrayBuffer format
- Rust/WASM — Move VectorStore and audio preprocessing to WebAssembly for SIMD acceleration
Source code is not publicly available. The live demo is free to use.