Stop typing. Start thinking visually.
An infinite canvas where Google Gemini AI directly navigates, organizes, and transforms your thoughts in real-time.
- Language: TypeScript
- Frontend: Next.js, React, Zustand
- Canvas: TLDraw, Excalidraw, React Flow
- AI: Google Gemini 2.5 Flash (GenAI SDK)
- Backend: Node.js, Express, Firebase Admin
- Database: Firestore
- Cloud: Google Cloud Run, Secret Manager, Artifact Registry
- Infrastructure: Terraform
- Voice: Web Speech API
- 🎯 The Problem
- ✨ What is Stun?
- 🚀 Key Features
- 🛠️ Tech Stack
- 🎓 Why Gemini 2.5 Flash?
- 🚀 Getting Started
- 📖 Usage Guide
- 🏗️ Architecture
- 🧪 Testing & QA
- ☁️ Google Cloud Deployment
- 📊 Performance
- 🔐 Security
- 📚 Documentation
- 🤝 Contributing
- 📈 Impact & Learning
- 🏆 Hackathon Category
- 🎥 Demo & Resources
👉 Open Stun Live App (Google Cloud hosted | Real-time deployment)
# Terminal 1: Firestore Emulator
cd backend && firebase emulators:start --only firestore --project stun-489205
# Terminal 2: Backend API
cd backend && bun install && bun run dev
# Terminal 3: Frontend App
cd web && bun install && bun run devWindows? Single command:
.\scripts\start-dev.ps1Traditional AI is text-in, text-out. You type. It responds. You're stuck in a chat box.
But thinking isn't linear. It's spatial. Visual. Interconnected.
Stun reimagines AI interaction — instead of receiving text responses, AI visually understands your canvas, interprets spatial relationships, and directly navigates your workspace. Every command becomes a visual transformation.
Stun is a UI Navigator that blends three synchronized canvas layers into one intelligent workspace:
🎨 Layer 1: TLDraw → Infinite pan/zoom workspace
📐 Layer 2: Excalidraw → Visual shapes & diagrams
🧠 Layer 3: React Flow → AI-readable knowledge graph
-
You speak or type a command
Example: "Turn this into a roadmap" -
Gemini sees your canvas (screenshot + structured node data)
-
AI plans actions (move, create, group, connect, zoom)
-
Actions execute live on your canvas
-
Your board transforms in real-time
No chat box. No back-and-forth. Pure visual interaction.
- Multimodal Context: Gemini analyzes both canvas screenshots AND structured node data
- Spatial Reasoning: AI understands relationships, distances, and hierarchies
- Action Planning: Generates executable, validated command sequences
- Infinite Workspace: Pan/zoom with TLDraw's operating system layer
- Visual Tools: Draw, shape, annotate with Excalidraw
- Knowledge Graph: React Flow nodes/edges for AI-readable logic
- Voice Commands: Web Speech API integration
- Text Input: Type or speak your intent
- Real-Time Execution: Watch AI transform your canvas live
- Live Presence: See who's editing (active user tracking)
- Shared Boards: Invite collaborators for joint thinking
- Instant Sync: All changes sync across users via Firestore
- Auto-Save: Every action auto-saved to Firestore (debounced 3s)
- Recovery: Resume work instantly, even after browser restart
- Conflict-Free: Last-write-wins strategy with Firestore timestamps
- OAuth 2.0: Google authentication (no passwords)
- JWT Tokens: Firebase ID tokens with 1-hour expiry, auto-renewal
- Access Control: Firestore rules enforce user-scoped read/write
- Secrets Management: API keys in Google Secret Manager (not in code)
- Framework: Next.js 14 (App Router, TypeScript)
- State: Zustand (lightweight, autosave-friendly)
- Canvas Engines:
- 🎨 TLDraw 2.4.6 (infinite workspace)
- 📐 Excalidraw 0.17.6 (visual editing)
- 🧠 React Flow 11.11.4 (knowledge graph)
- Voice: Web Speech API
- Screenshots: html2canvas
- Styling: SCSS
- Storage: Firebase SDK + localStorage
- Runtime: Node.js 20+
- Framework: Express.js 5 (TypeScript)
- AI Model: Google Gemini 2.5 Flash (via Google GenAI SDK)
- Database: Firestore (NoSQL, real-time listeners)
- Authentication: Firebase Admin SDK
- Validation: Zod (type-safe runtime checks)
- Logging: Winston
- Compute: Cloud Run (auto-scaling containers)
- Database: Firestore (NoSQL, real-time)
- Secrets: Secret Manager (API key storage)
- Registry: Artifact Registry (container images)
- Terraform: Infrastructure as Code
- Container: Docker (separate images for backend & frontend)
- Orchestration: Terraform (6+ modules for GCP resources)
- CI/CD: GitHub Actions → Artifact Registry → Cloud Run
- Region: us-central1 (multi-zone availability)
| Capability | Why It's Perfect |
|---|---|
| Multimodal | Understands screenshots + text context together |
| Spatial Reasoning | Interprets node positions, connections, grouping |
| Speed | 100-500ms inference (real-time response) |
| Cost | Fast models = lower bill per 1M tokens |
| JSON Output | Native structured response (easy validation) |
- Node.js 20+ (download)
- Bun (install) or npm/yarn
- Google Cloud Account (free tier eligible)
- Gemini API Key (get free here)
- Firebase Project (create one)
Step 1: Clone Repository
git clone https://github.com/Invariants0/Stun.git
cd StunStep 2: Backend Setup
cd backend
cp .env.example .env.local
# Edit .env.local and add:
# GEMINI_API_KEY=your_key_here
# GCP_PROJECT_ID=stun-489205
# FIREBASE_SERVICE_ACCOUNT_KEY=<JSON from Firebase>
bun install
bun run dev
# Backend runs on http://localhost:8080Step 3: Frontend Setup
cd ../web
cp .env.example .env.local
# Edit .env.local and add Firebase config:
# NEXT_PUBLIC_FIREBASE_API_KEY=...
# NEXT_PUBLIC_FIREBASE_PROJECT_ID=...
bun install
bun run dev
# Frontend runs on http://localhost:3000 → /board/demo-boardStep 4: Firestore Emulator (in separate terminal)
cd backend
firebase emulators:start --only firestore --project stun-489205✅ You're ready! Open http://localhost:3000/board/demo-board
http://localhost:3000/board/demo-board
- Use Excalidraw tools to draw shapes
- Click to create React Flow nodes
- Connect nodes with edges
- Voice: Click the mic 🎤 button, speak your intent
- Text: Type in the floating command bar (
Ctrl+Kto focus)
Gemini analyzes your canvas + command, then executes actions live:
- Move nodes
- Create new nodes
- Group related elements
- Connect nodes with edges
- Zoom to focus areas
- Invite collaborators via share button
- See active users in real-time
- All changes sync instantly
USER BROWSER
↓
NEXT.JS FRONTEND (Hybrid Canvas)
↓ HTTP REST + Firebase JWT
CLOUD RUN BACKEND (Express.js)
├─ Intent Parser (command type detection)
├─ Orchestrator (spatial context builder)
├─ Gemini Service (AI coordination)
├─ Board Service (CRUD)
├─ Presence Service (collaboration)
└─ Auth Middleware (JWT validation)
↓
GOOGLE GEMINI 2.5 FLASH (AI Planning)
↓ JSON Action Plan
FIRESTORE DATABASE
├─ boards (canvas state)
└─ board_presence (active users)
Full Architecture Diagram: See ARCHITECTURE.md
cd backend
bun test tests/gemini/gemini-connectivity.test.ts
bun test tests/gemini/gemini-actions.test.tscd backend
bun test tests/firestore.test.tscurl http://localhost:8080/healthcd backend
bun test tests/ai.test.tsgcloud auth login
gcloud config set project stun-489205
terraform --version # >= 1.5.0cd infra/environments/dev
terraform init
terraform plan
terraform applycd infra
./scripts/deploy.ps1 # Windows
./scripts/deploy.sh # macOS/Linuxhttps://stun-frontend-dev-279596491182.us-central1.run.app
gcloud run logs read stun-backend-dev --limit=100
gcloud run logs read stun-frontend-dev --limit=100Proof of GCP Deployment:
- ✅ Live app: stun-frontend-dev
- ✅ IaC code: infra/modules/
- ✅ Terraform configs: Cloud Run, Firestore, Secret Manager, Artifact Registry
| Metric | Value | Notes |
|---|---|---|
| Canvas Interaction | <16ms | 60fps rendering |
| Screenshot Capture | 100-300ms | html2canvas |
| Gemini API Call | 200-800ms | LLM inference |
| Firestore Write | 50-200ms | Network I/O |
| Full AI Cycle | 500-1500ms | End-to-end command execution |
Optimizations:
- Debounced auto-save (3s) reduces write load
- Optimistic UI updates before persistence
- 3-layer canvas render optimization with requestAnimationFrame
- Firestore real-time listeners for sub-second collaboration sync
- ✅ Google OAuth 2.0 for login
- ✅ Firebase JWT tokens (1-hour TTL, auto-refresh)
- ✅ Backend validates every request token
- ✅ Firestore rules scoped to user ID
- ✅ Secrets in Google Secret Manager (not in code)
- ✅ HTTPS enforced (Cloud Run default)
- ✅ httpOnly cookies (XSS-resistant)
- ✅ CORS whitelist for frontend domain
- ✅ Zod schemas validate all API requests
- ✅ Zod validates Gemini JSON responses (prevents hallucinations)
- ✅ Position sanitization prevents out-of-bounds node placement
- ✅ Rate limiting (express-rate-limit)
| 📄 Document | 📝 Purpose | 🔗 Link |
|---|---|---|
| Architecture Overview | Complete system design & data flow | docs/ARCHITECTURE.md |
| Canvas System | 3-layer hybrid canvas synchronization | docs/Canvas-system.md |
| Product Requirements | Feature specifications & roadmap | docs/PRD.md |
| Deployment Runbook | GCP deployment procedures | DEPLOY.md |
| Local Testing Guide | Development setup & troubleshooting | LOCAL_TESTING_GUIDE.md |
We welcome contributions! To contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit changes:
git commit -m "Add your feature" - Push branch:
git push origin feature/your-feature - Open a Pull Request
Code Standards:
- TypeScript (strict mode)
- ESLint + Prettier for formatting
- Zod for runtime validation
- Tests for new features (Bun test)
A production-grade spatial AI thinking environment that proves AI can go beyond chat. Instead of responding in text, our AI visually navigates your workspace.
- Gemini's Multimodal Power: Screenshots + structured text data = richer AI understanding
- Real-Time Interaction UX: Users expect <1s response times for AI actions
- Hybrid Architecture Complexity: Syncing 3 canvas layers requires careful state management
- Firestore at Scale: 1MB document limits force creative data structuring
- Spatial Reasoning Challenge: Teaching AI to understand coordinates & layouts is non-trivial
- 📋 Project Management: Visual task boards with AI auto-organization
- 🧠 Brainstorming: Mind maps that AI helps structure
- 🎨 Design Thinking: Collaboration boards with AI layout assistance
- 📊 Data Visualization: Charts that AI reorganizes based on insights
- 🧑🎓 Education: Interactive learning spaces with AI mentoring
Category: UI Navigator ☸️
Challenge: Build an agent that visually understands UI and performs actions based on intent
How Stun Qualifies:
- ✅ Visual UI Understanding: Gemini analyzes canvas screenshots
- ✅ Multimodal Input: Images (screenshots) + text (commands) + structured data (nodes)
- ✅ Executable Actions: AI outputs validated, sanitized actions that execute on canvas
- ✅ Real-Time Interaction: Sub-2-second command-to-execution cycle
- ✅ Live Deployment: Production-grade app running on Google Cloud
Need help? Check these resources:
| 💬 Channel | 🔗 Link | 📌 For |
|---|---|---|
| 🐛 Issues | github.com/.../Stun/issues | Bug reports & feature requests |
| 💡 Discussions | github.com/.../Stun/discussions | Questions & ideas |
| 📖 Code | github.com/Invariants0/Stun | Source + PRs |
| 🏆 Hackathon | devpost.com/.../stun-7ct2km | Submission details |
License: MIT — See LICENSE
Built With ❤️ by a passionate team using:
- Google Gemini — Multimodal AI powerhouse
- Google Cloud Platform — Production infrastructure
- Open Source — TLDraw, Excalidraw, React Flow, Next.js, Express, Bun