3D interactive visualization of AI embedding spaces. Fly through the conceptual geography of how language models represent ideas — right in your browser.
Noosphere takes a vocabulary of 10,000+ English words, embeds them using an AI model (MiniLM or Qwen3), reduces the high-dimensional vectors to 3D with PaCMAP, clusters them with HDBSCAN, and renders the result as an interactive point cloud you can explore.
Each glowing point is a word. Nearby points are semantically similar. Colors represent clusters of related concepts. You navigate an AI's mind.
- Semantic teleport — type any word or sentence, the model embeds it in real-time and flies you to where it lives in the space
- Bias probe — pick two concepts as poles (e.g. "male" / "female") and watch the entire space recolor on a gradient showing every concept's relative association. Export results as CSV.
- Neighborhood view — select any point, see its nearest neighbors highlighted with connecting constellation lines
- Analogy explorer — input "A is to B as C is to ___" and watch vector arithmetic play out in 3D
- Comparison mode — embed two sentences and see where they land, how far apart they are, and what surrounds them
- Fly mode — switch from orbit to WASD + mouse look for full free-flight immersion
- Beginner / Advanced toggle — progressive disclosure of analytical tools
- Python 3.11 — python.org (3.12 may work but is untested)
- uv — fast Python package manager:
pip install uvor see docs.astral.sh/uv - Node.js 18+ — nodejs.org
- GPU (optional) — CUDA or MPS (Apple Silicon) for faster pipeline runs. CPU works fine.
git clone https://github.com/davidkny22/Noosphere.git
cd Noosphere
npm run setup # installs Python + Node dependencies (~2 min first time)
npm start # launches server + frontend togetherOpen http://localhost:5173 and explore.
A pre-built 10K-word MiniLM space ships with the repo. No pipeline run needed.
If you prefer to run components separately, or need more control:
cd server
uv sync
uv run serveStarts at http://localhost:8000. The server loads all spaces found in web/public/spaces/ and provides embedding, neighbor search, bias probing, analogy, and comparison endpoints.
The server is required for advanced features (embed, bias probe, analogy, comparison). The visualization itself works without it — you can still browse and search the pre-built space.
cd web
npm install
npm run dev # starts both frontend + server via concurrently (default)
npm run dev:web # starts frontend only (if you're running the server separately)Opens at http://localhost:5173.
The pre-built MiniLM 10K space is included, but you can generate custom spaces with the pipeline:
cd pipeline
uv sync
uv run build_space.py --model minilm --vocab-size 10000
uv run build_space.py --model qwen3 --vocab-size 10000 # requires more VRAMOutput goes to web/public/spaces/. The frontend auto-discovers all available spaces via index.json.
--model {minilm,qwen3} Embedding model to use
--vocab-size N Number of vocabulary terms (default: 10000)
--device {auto,cuda,mps,cpu} Compute device
--batch-size N Embedding batch size
--compress Gzip the output JSON
uv run filter_space.py— downsize an existing space to fewer termsuv run rebuild_faiss.py— rebuild FAISS index for a spaceuv run export_embeddings.py— export HD embeddings to binary format
GPU (CUDA or Apple Silicon MPS) is recommended for larger vocabularies. CPU works fine for 10K.
| Input | Action |
|---|---|
| Drag | Orbit / rotate (orbit mode) or look around (fly mode) |
| Scroll | Zoom in / out |
| Right-drag | Pan |
| Click | Select a point — opens info panel |
| Hover | Tooltip with term + cluster |
/ |
Focus search bar |
| Escape | Clear search, restore colors |
` |
Toggle FPS stats |
| Input | Action |
|---|---|
| WASD | Move forward / left / back / right |
| Space | Fly up |
| Ctrl | Fly down |
| Shift | 2x speed |
pipeline/ Python CLI — vocab → embed → PaCMAP 3D → HDBSCAN → space JSON
server/ FastAPI backend — embedding, neighbors, bias, analogy, compare
web/ React Three Fiber frontend
src/
components/ SpaceCanvas, PointCloud, SearchBar, BiasProbePanel, ...
systems/ Color system (cluster palette, bias gradient, search highlight)
store/ Zustand state management
hooks/ Space loader, fuzzy search (Fuse.js), GPU picking
services/ Embedding service abstraction (remote API)
- Pipeline generates a space: embeds vocabulary → PaCMAP 3D reduction → HDBSCAN clustering → trains a ParamPaCMAP projection network → builds FAISS index → packages everything as compressed JSON + binary artifacts.
- Server loads the embedding model + FAISS index + projection network at startup. Provides real-time embedding of novel text, nearest-neighbor search, bias scoring (SemAxis), analogy computation, and text comparison — all in high-dimensional space for maximum accuracy.
- Frontend renders the space as an InstancedMesh point cloud with custom GLSL shaders (single draw call for 10K+ points), handles navigation, search, and all interactive features. Communicates with the server for embedding operations.
All endpoints expect JSON. The space field identifies which space to query (e.g., minilm-10k).
| Method | Path | Description |
|---|---|---|
GET |
/health |
List available spaces and their metadata |
POST |
/embed |
Embed text → 3D coords + K nearest neighbors |
POST |
/neighbors |
Find K nearest neighbors for a point by index |
POST |
/bias |
Bias scores between two poles (SemAxis) for all terms |
POST |
/analogy |
Solve "A is to B as C is to ?" via vector arithmetic |
POST |
/compare |
Compare two texts: cosine similarity + 3D positions |
Interactive API docs at http://localhost:8000/docs when the server is running.
| Variable | Default | Description |
|---|---|---|
PORT |
8000 |
Server port |
HOST |
127.0.0.1 |
Server bind address |
CORS_ORIGINS |
localhost Vite ports | Comma-separated allowed origins |
NOOSPHERE_SPACE_DIR |
web/public/spaces |
Directory containing space artifacts |
RELOAD |
false |
Enable uvicorn auto-reload (dev only) |
OPENAI_API_KEY |
— | Optional: GPT-powered cluster labels in pipeline |
VITE_API_URL |
http://localhost:8000 |
Frontend: embedding server URL |
See .env.example for a template.
| Layer | Technology |
|---|---|
| Embedding models | sentence-transformers (MiniLM 384d, Qwen3 1024d) |
| Dimensionality reduction | PaCMAP (subprocess-isolated for macOS ARM64 compatibility) |
| Parametric projection | ParamPaCMAP (trained network for projecting novel inputs to 3D) |
| Clustering | HDBSCAN on 3D positions |
| Neighbor search | FAISS (IndexFlatIP, cosine similarity) |
| Rendering | React Three Fiber v9, InstancedMesh, custom GLSL shaders (single draw call) |
| Search | Fuse.js fuzzy matching |
| State | Zustand |
| Build | Vite, TypeScript |
| API | FastAPI (Python, async) |
Noosphere builds on these foundational works:
| Component | Paper | Authors | Year |
|---|---|---|---|
| Embedding model | MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers | Wang, Wei, Dong, Bao, Yang, Zhou | 2020 |
| Dimensionality reduction | Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP for Data Visualization | Wang, Huang, Rudin, Shaposhnik | 2021 |
| Parametric projection | Navigating the Effect of Parametrization for Dimensionality Reduction | Huang, Wang, Rudin | 2024 |
| Neighbor search | The Faiss Library | Douze, Guzhva, Deng, Johnson et al. | 2024 |
| Clustering | Density-Based Clustering Based on Hierarchical Density Estimates | Campello, Moulavi, Sander | 2013 |
This project is licensed under the GNU Affero General Public License v3.0.
If you'd like to use Noosphere in a proprietary product or service without the AGPL v3 obligations, commercial licenses are available. Contact @davidkny22 on GitHub to discuss.