Self-hostable AI prompt playground with local LLM support
Build, test, and share AI prompts with zero API costs. Run entirely on your machine with Docker.
- One-command setup -
docker-compose upand you're running - Fully self-hosted - Your prompts never leave your machine
- Zero API costs - Uses local Ollama models (qwen2.5:3b by default)
- Share & Fork - Generate shareable URLs for any prompt
- Real-time streaming - Watch AI responses generate live
- Markdown rendering - Beautiful formatted responses with syntax highlighting
- GPU accelerated - Leverages your NVIDIA GPU automatically
- No dependencies - Everything runs in Docker containers
- Docker Desktop installed
- 10GB free disk space (for Ollama model)
- (Optional) NVIDIA GPU with CUDA support
# Clone the repository
git clone https://github.com/heyrtl/sharpie.git
cd sharpie
# Start all services
docker-compose up --buildThat's it! Open http://localhost:5173 in your browser.
First run takes 5-10 minutes to download the Qwen2.5-3B model (~2GB).
- Write your prompts - System and user prompts in the editor
- Run - Press
Cmd/Ctrl + Enteror click "Run Prompt" - Share - Click "Share" to get a shareable URL
- Fork - Click "Fork" to create a copy and modify
Share URLs like http://localhost:5173?p=abc123 with anyone running Sharpie. They can:
- View your prompt
- Run it with their local model
- Fork and modify it
- Click the settings icon
- Select from available Ollama models
- Models are auto-detected from your Ollama instance
Frontend (React + Vite)
↓
Backend (FastAPI)
↓
Ollama (Local LLM)
↓
SQLite (Prompt Storage)
- Frontend: React app with real-time streaming UI
- Backend: FastAPI server handling prompts and streaming
- Ollama: Local LLM inference with GPU acceleration
- SQLite: Embedded database for saved prompts
Pull any Ollama model:
docker exec -it sharpie-ollama ollama pull llama3.2:3b
docker exec -it sharpie-ollama ollama pull mistral:7bThen select it in Settings.
Copy .env.example to .env and customize:
OLLAMA_HOST=http://ollama:11434
DATABASE_PATH=/app/data/sharpie.dbGPU is auto-detected. To disable GPU and run CPU-only, remove the deploy section from docker-compose.yml.
sharpie/
├── backend/ # FastAPI server
│ ├── main.py # API routes
│ ├── database.py # SQLite handlers
│ ├── models.py # Pydantic models
│ └── utils.py # Helpers
├── frontend/ # React app
│ └── src/
│ ├── App.jsx
│ ├── components/
│ └── utils/
└── docker-compose.yml
Backend:
cd backend
pip install -r requirements.txt
uvicorn main:app --reloadFrontend:
cd frontend
npm install
npm run devMake sure Ollama is running separately.
Change ports in docker-compose.yml:
ports:
- "8001:8000" # Backend
- "5174:5173" # FrontendManually pull the model:
docker exec -it sharpie-ollama ollama pull qwen2.5:3bCheck NVIDIA Docker runtime:
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smiIf it fails, you may need to install nvidia-container-toolkit.
The Ollama model requires ~2GB. Free up space or use a smaller model:
docker exec -it sharpie-ollama ollama pull qwen2.5:0.5bContributions are welcome! Here's how:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed setup instructions.
- Multi-model API support (OpenAI, Claude, Gemini)
- Prompt versioning and history
- Collaborative editing
- Export prompts as JSON
- Prompt analytics
- Browser extension
- Model comparison view
See SECURITY.md for security considerations and best practices.
MIT License - see LICENSE for details.
Ratul Rahman (@heyrtl)
- Website: ratul-rahman.com
- GitHub: @heyrtl
- Twitter: @heyrtl
- Ollama for local LLM inference
- FastAPI for the backend framework
- React for the frontend
- Qwen team for the excellent small language models
If you find this useful, consider giving it a star!
Built with care for the prompt engineering community.







