VexAI is a powerful, flexible Discord bot designed to be your server's ultimate AI assistant. It leverages Large Language Models to provide advanced moderation, administration, automation, and conversational capabilities β replacing the need for multiple specialized bots.
Seamlessly switch between OpenAI, Anthropic, OpenRouter, or any OpenAI-compatible API. Per-user model overrides and mode presets allow fine-grained control over which model handles each user's requests.
- Persistent conversation history β two-tier architecture (in-memory hot tier + SQLite cold tier) preserves context across bot restarts
- Conversation summaries β long sessions are summarized and injected into future prompts for continuity
- Semantic memory β FTS5-powered full-text search for storing and recalling facts, preferences, events, and decisions
- Local message database β full mirror of every Discord message with FTS5 search, background backfill, and real-time sync
- Message audit trail β every edit and deletion is preserved in a revision history (original, edit, delete revisions).
get_message_editstool shows full revision timeline with diffs. - Local-first search β all message searches query the local database first (up to 5,000 results), falling back to Discord API (capped at 50) when the local DB is unavailable. The bot is informed when results come from the API fallback. Sort ascending/descending, exact substring or FTS word matching, date range filters.
- Semantic search β AI embedding-powered meaning-based search that finds related messages even without matching keywords
- SQLite (default) β zero-config, WAL mode, FTS5 full-text search. Perfect for single-server deployments.
- PostgreSQL β production-grade backend with connection pooling, tsvector full-text search, and pgvector for AI embeddings. Use
docker-compose.postgres.ymlfor easy deployment. - Live migration β
/migrate to-postgresslash command streams all data from SQLite to PostgreSQL with a live progress bar./migrate statusshows current row counts. - Embedding storage β SQLite stores embeddings in a separate
message_embeddingstable; PostgreSQL uses pgvector's nativevectorcolumn type with IVFFlat indexing.
- Built-in (default) β runs ONNX models directly in Node.js via Transformers.js. Zero API keys, fully offline. Models:
Xenova/all-MiniLM-L6-v2(384 dims, ~23 MB),Xenova/bge-small-en-v1.5,Xenova/gte-small. - OpenAI β
text-embedding-3-small(1536 dims),text-embedding-3-large(3072 dims) - OpenRouter β access many embedding models via a single API key:
openai/text-embedding-3-small,nvidia/llama-nemotron-embed-vl-1b-v2,qwen/qwen3-embedding-0.6b, etc. - Local API β any OpenAI-compatible
/v1/embeddingsendpoint (Ollama, LocalAI, etc.)
- Channels β create, edit, delete, lock/unlock, set permissions, slowmode
- Threads & Forums β create threads, manage members, create forum posts, list tags
- Scheduled Events β create, list, delete events, view attendees
- Interactive Components β buttons, select menus, edit/remove components
- Reaction Roles β emoji-to-role mappings with live reaction event handlers
- Webhooks β create, list, delete, send messages via webhooks
- Channel Permissions β per-channel permission overrides for users and roles
- Crossposting β publish announcement channel messages to followers
- Embeds, Polls, Reactions, Pins β full messaging toolkit
- Kick, ban, timeout, warn with tracking
- Purge messages with filters
- Auto-roles on join
- Welcome messages
- Permission auditing
- Audit log access
- Custom API caller β register named API endpoints with saved auth headers and default params, call them by name. SSRF protection blocks private IPs. Rate-limited at 10 calls/min per API.
- RSS feed monitor β subscribe channels to RSS/Atom feeds with configurable polling intervals (min 5m). Supports embed, compact, and full post formats.
- Inbound webhook server β lightweight HTTP server receives payloads from external services (CI/CD, monitoring, etc.) and posts to Discord. HMAC signature verification for GitHub, GitLab, and generic webhooks.
- GitHub convenience tools β repo info, issues, pull requests, and commits. Works with a registered API config or a personal access token.
- OpenMail inbox approvals β inbound emails are queued in an owner-only approval channel and are never exposed to LLMs until explicitly approved. Owner reactions: approve, archive-to-memory, delete.
- Text-to-speech β standalone tool that generates audio files and sends them in any channel. Providers: OpenAI, Google Cloud, ElevenLabs, or self-hosted (Kokoro, Piper, AllTalk β any OpenAI-compatible
/v1/audio/speechendpoint). All providers support custombaseUrlfor OpenAI-compatible endpoints. - Speech-to-text β transcribe voice messages, audio files, or any URL. Providers: OpenAI Whisper API, Google Cloud STT, or self-hosted (faster-whisper, whisper.cpp β any OpenAI-compatible
/v1/audio/transcriptionsendpoint). All providers support custombaseUrlfor OpenAI-compatible endpoints. Automatic format fallback: if the endpoint rejects OGG/Opus or other non-WAV formats, audio is converted to WAV via ffmpeg and retried transparently. - Voice channels β join, leave, speak via TTS, listen and transcribe users in real-time. Always unmuted; deafen/undeafen indicates whether listening mode is active. Chunked audio streaming prevents cutoff on long speech.
- Meeting mode β join a voice channel, transcribe everything, and produce a full transcript when the meeting ends.
- Voice assistant β interactive voice assistant with configurable wake word detection (default: "hey vex"), alternative wake aliases, optional text-channel responses, conversation history, and idle timeout.
- Self-hosted first β local providers need only a
baseUrl, no API keys. Zero cost with Kokoro TTS + faster-whisper.
- Specialized agents β create custom agents with dedicated identities, instructions, and tool access
- Two execution modes β sub-agent (called internally, returns result) and hand-over (takes over conversation)
- Creation wizard β guided interview in a private channel to design new agents with structured questions and progress tracking
- Planning step β agents plan before executing, improving tool selection accuracy
- Error recovery β failed tool calls tracked and capped at 2 retries with contextual suggestions
- Model-specific hints β prompt tuning for Kimi, MiniMax, DeepSeek, Qwen improves tool-use reliability
- Handover safeguards β 30-minute timeout, 50-message limit, periodic reminders, user escape hatch (
!exit)
- Spawnable sub-agents β spawn specialized background agents that run autonomously to accomplish specific goals
- Task board β create, update, and track tasks across sub-agents with status tracking
- Safety controls β tool blacklist (no exec_command, ban, kick, purge, etc.), max 10 iterations (capped at 20), 5-minute timeout, error tracking with abort after 2 failures of the same call
- Lifecycle management β spawn, check status, cancel, and list all running/completed sub-agents
- Enable/disable β toggle via
/subagentsslash command or/config set subAgents.enabled true
- Graph-based workflows β directed acyclic graph (DAG) of nodes connected by edges, with template rendering via
{{variable}}syntax - 30+ node types across 5 categories:
- Trigger β discord_event, command, reaction_role, webhook, schedule, manual
- Condition β if, switch, cooldown, permission, time_window
- Action β send_message, ai_generate, tool_call, assign_role, create_thread, dm_user, http_request, sub_agent, delay, log
- Transform β template, json_extract, ai_extract, merge, map
- Control β parallel, join, loop, error_handler
- Execution limits β 2-minute timeout per run, max 100 nodes per execution
- Run logging β full execution history with node-by-node status, results, and duration
- Enable/disable β toggle via
/workflowsslash command or/config set workflows.enabled true
- Central event bus β pub/sub event routing with filter matching across all event sources
- 5 event sources:
- Discord β message, reaction, member join/leave, role/channel/thread changes
- Webhooks β inbound webhook payloads with GitHub/GitLab-specific event types
- WebSocket β persistent WebSocket connections with connect/message/disconnect events
- REST polling β poll REST APIs on a schedule, emit events on change detection
- Cron β time-based trigger events via cron expressions
- Listener actions β post templated messages, generate AI responses, execute tools, or forward to external webhooks
- Filtering β match by event type, source, guild/channel/user, and custom field conditions (equals, contains, regex, gt, lt, exists)
- Rate limiting β per-listener cooldown to prevent spam
- Status message β when the bot starts executing tools, a temporary message is posted showing what is happening in real-time
- LLM-summarized progress β a configurable summarizer model reads the tool execution log every ~4 seconds and generates a short title + description of current activity
- Sub-agent tracking β active sub-agents and their goals are listed in the status message
- Auto-cleanup β the status message is automatically deleted once the final reply is sent
- Response footer β every reply includes a small subtext footer showing total time taken and a breakdown of tools called (e.g.
β± 4.2s Β· Tools: 3Γ search_messages, 1Γ fetch_messages)
- Tool result cache β per-tool TTLs (5min for static data, 2min for semi-static, 1min for external), automatic invalidation on mutations
- Security verdict cache β 10-minute TTL on approve/deny verdicts, eliminates redundant LLM calls
- Prompt optimizer β lazy skill descriptions based on message content keywords, reducing system prompt tokens by 30-50%
- Response dedup cache β 30-second TTL catches rapid duplicate messages
- Cache statistics β
cache_statstool shows hit rates and entry counts for all caches - Local-first queries β local database searches have zero caching overhead; only Discord API fallback calls are cached
- Risk-tiered review β 4-tier classification for all tools:
- Tier 0 (Exempt): Read-only tools auto-approved, zero overhead
- Tier 1 (Low): Non-destructive writes get rule-based checks only, no LLM call
- Tier 2 (Medium): Impactful but reversible tools get LLM review with verdict caching
- Tier 3 (High): Destructive tools always get full LLM review
- Rate limiter β per-user per-tool rate limits (e.g. 10 messages/min, 3 DMs/min)
- Optimized prompt β shorter observer prompt reduces token cost per review
- ~80-85% reduction in security LLM calls compared to reviewing every tool invocation
- Fail-closed β errors default to deny, escalations alert a configurable channel
Modular architecture with 22 built-in skills:
| Skill | Tools | Description |
|---|---|---|
| discord | 29 | Core messaging, embeds, reactions, polls, threads, forums, crossposting |
| moderation | 8 | Kick, ban, timeout, warn, purge |
| server-management | 15+ | Channels, roles, emojis, invites, events, webhooks, permissions |
| user-management | 5 | User info, nickname, role assignment |
| components | 6 | Buttons, select menus, reaction roles |
| memory | 1 | Search structured bot logs |
| semantic-memory | 5 | FTS5-powered remember, recall, forget, list, update |
| discord-search | 6 | Search messages, user history (up to 5K local / 50 API fallback, sort asc/desc, exact match, date filters), context, stats, semantic search, message edit history |
| integrations | 17 | Custom APIs, RSS feeds, inbound webhooks, GitHub tools |
| 3 | OpenMail approved-thread listing/reading and reply sending | |
| voice | 13 | TTS, STT, voice channels, meeting transcription, voice assistant |
| agent-system | 7 | Create, invoke, handover, list, info, delete, toggle agents |
| sub-agents | 7 | Spawn background agents, check/cancel status, task board |
| workflows | 9 | Create, list, get, update, delete, toggle, trigger workflows, run history |
| event-system | 13 | Event listeners, WebSocket connections, REST polling, event emission, event log |
| graph-renderer | 1 | Chart rendering (bar, line, area, pie, histogram, heatmap, treemap, sankey, etc.) |
| cron | 3 | Scheduled recurring tasks |
| utility | 3 | Reminders, web search, URL fetching |
| system | 5 | Exec commands, download files, restart, credits, cache stats |
| table-renderer | 1 | Canvas-based table rendering with emoji support |
| skill-manager | 5 | Create, read, edit, delete, list user skills |
| skill-agent | 1 | Execute custom skill code |
| image-generation | 1 | AI image generation with reference image support |
Custom skills can be hot-loaded from the skills/ directory at runtime.
| Command | Description |
|---|---|
/allow add/remove/list |
Manage allowed users (owner only) |
/model set/remove/list |
Per-user model overrides |
/mode set/clear/status/list |
Switch between fast/cheap/balanced/quality modes |
/observer enable/disable/reset/status |
Toggle the security observer |
/set-iterations |
Configure tool call limits per message |
/imagegen enable/disable/model/status |
AI image generation settings |
/config view/set/reset |
View and modify bot config at runtime with autocomplete |
/agents list/info/delete/toggle |
Manage specialized agents |
/subagents enable/disable/status |
Toggle the sub-agents system |
/workflows enable/disable/status |
Toggle the workflow engine |
/sync status |
Show message database sync progress or SYNC COMPLETE |
/sync rescan <days> |
Re-scan recent messages to fill gaps from bot downtime |
/migrate status |
Show current database backend and table row counts |
/migrate to-postgres |
Migrate all data from SQLite to PostgreSQL |
/credits |
Check OpenRouter API balance |
/restart |
Force restart the bot (owner only) |
/config view [section]β view current config with sensitive values masked (tokens show first/last 4 chars)/config set <key> <value>β change any config value using dot notation (e.g.voice.enabled true,llm.model gpt-4o) with autocomplete for all 81 config paths, type validation, and enum checking/config reset <key>β reset to schema default- Auto-defaults β on first boot, all missing config sections are written to
data/config.jsonwith their defaults, fully beautified - Changes that require a restart (LLM, database, embeddings) show a warning; security, voice, sub-agents, and workflow settings take effect immediately
All configurable paths:
| Section | Settings |
|---|---|
discord.* |
token (read-only), adminUserIds (read-only) |
llm.* |
provider, apiKey, model, baseUrl |
security.* |
enabled, exemptTools, alertChannelId |
security.llm.* |
provider, apiKey, model, baseUrl |
conversationWindow |
Max messages in context window |
integrations.webhookServer.* |
enabled, port, baseUrl |
integrations.github.* |
token |
integrations.openmail.* |
enabled, apiKey, inboxId, approvalChannelId, autoCreateApprovalChannel, wsEnabled, approvalEmojis.* |
voice.* |
enabled |
voice.tts.* |
enabled, provider, apiKey, baseUrl, model, voice |
voice.stt.* |
enabled, provider, apiKey, baseUrl, model |
voice.assistant.* |
enabled, wakeWord, wakeAliases, respondWithText, textChannelId, maxHistoryTurns, idleTimeout |
database.* |
type, url, pool.min, pool.max |
embeddings.* |
enabled, provider, apiKey, baseUrl, model, dimensions, batchSize, dailyLimit |
subAgents.* |
enabled |
workflows.* |
enabled |
thinking.* |
enabled, pollIntervalMs, model |
- Identity system β customizable personality via IDENTITY.md, SOUL.md, RULES.md, CORE_MEMORY.md
- Per-user identity β per-user overrides for identity, rules, and memory
- Approval gate β destructive actions require user confirmation via Discord reactions
- Table rendering β canvas-based table images with emoji support
- Image generation β AI image generation via configurable providers, supports reference images (edit, style transfer, etc.)
- Comment syntax β messages starting with
//are ignored, letting users annotate bot replies without triggering a response - Cron jobs β scheduled recurring tasks
- Reminders β time-based reminders for users
- Credit monitoring β OpenRouter balance tracking with low-balance alerts
- Docker ready β easy deployment with docker-compose (SQLite or PostgreSQL)
Before you can run VexAI, you need to create a bot application on Discord and obtain its token.
For the official documentation, refer to the Discord Developer Portal Documentation.
- Go to the Discord Developer Portal.
- Log in with your Discord account.
- Click "New Application", name it (e.g., "VexAI"), and click Create.
- In the left menu, click "Bot".
- Scroll to "Privileged Gateway Intents" and enable:
- Presence Intent
- Server Members Intent
- Message Content Intent
- Save changes.
- On the "Bot" page, click "Reset Token".
- Copy this token immediately and store it safely. You'll need it for the
.envfile. Never share this token publicly!
- Navigate to "OAuth2" -> "URL Generator".
- Under "Scopes", check
botandapplications.commands. - Under "Bot Permissions", select Administrator (or specific permissions).
- Copy the generated URL, paste it into your browser, select your server, and authorize.
- Node.js v22+ (if running natively)
- Docker & Docker Compose (if using Docker)
- API keys for your preferred LLM provider
- Clone the repository.
- Copy the example environment file:
cp .env.example .env
- Edit
.env:
# Discord
DISCORD_TOKEN=your_discord_bot_token
# LLM Provider: openai | anthropic | openrouter | openai-compatible
LLM_PROVIDER=openai
LLM_API_KEY=your_api_key
LLM_MODEL=gpt-4o
# LLM_BASE_URL= # Only for openai-compatible provider
# Security Observer (optional, uses main LLM if not set)
# SECURITY_LLM_PROVIDER=openai
# SECURITY_LLM_API_KEY=
# SECURITY_LLM_MODEL=gpt-4o-mini
# Admin Discord user IDs (comma-separated)
ADMIN_USER_IDS=your_discord_user_id
# Security alert channel ID (optional)
# SECURITY_ALERT_CHANNEL_ID=
# Web search (optional)
# SEARCH_PROVIDER=brave # brave | searxng
# SEARCH_API_KEY=your_brave_search_api_key
# SEARXNG_URL=http://localhost:8080
# Database (optional, defaults to SQLite)
# DATABASE_TYPE=postgres
# DATABASE_URL=postgres://vexai:password@localhost:5432/vexai
# Embeddings for semantic search (optional)
# EMBEDDINGS_ENABLED=true
# EMBEDDINGS_PROVIDER=builtin # builtin | openai | openrouter | local
# EMBEDDINGS_MODEL=Xenova/all-MiniLM-L6-v2
# EMBEDDINGS_API_KEY= # For openai/openrouter
# EMBEDDINGS_BASE_URL= # For local provider
# EMBEDDINGS_DIMENSIONS=384Add to data/config.json:
{
"voice": {
"enabled": true,
"tts": {
"provider": "local",
"baseUrl": "http://localhost:8880",
"voice": "af_heart",
"model": "kokoro"
},
"stt": {
"provider": "local",
"baseUrl": "http://localhost:8000",
"model": "large-v3"
},
"assistant": {
"enabled": true,
"wakeWord": "hey vex",
"respondWithText": false,
"idleTimeout": 300
}
}
}TTS providers: openai, google, elevenlabs, local (Kokoro, Piper, AllTalk β any OpenAI-compatible /v1/audio/speech endpoint)
STT providers: whisper-api, google, local (faster-whisper, whisper.cpp β any OpenAI-compatible /v1/audio/transcriptions endpoint). If an endpoint does not support the source audio format (OGG, Opus, WebM, FLAC, etc.), the audio is automatically converted to WAV via ffmpeg and retried.
Add to data/config.json:
{
"integrations": {
"webhookServer": {
"enabled": true,
"port": 3847,
"baseUrl": "https://mybot.example.com"
},
"github": {
"token": "ghp_your_personal_access_token"
}
}
}Add to data/config.json:
{
"integrations": {
"openmail": {
"enabled": true,
"apiKey": "om_live_xxx",
"inboxId": "inb_xxx",
"approvalChannelId": "123456789012345678",
"autoCreateApprovalChannel": true,
"wsEnabled": true,
"approvalEmojis": {
"approve": "β
",
"archive": "ποΈ",
"delete": "ποΈ"
}
}
}
}Behavior:
- Every inbound OpenMail message is persisted locally and posted to the owner-only approval channel.
- Approval actions are deterministic reaction handlers (no LLM): approve / archive-to-memory / delete.
- Pending approval messages are reconciled on restart.
- The approval channel is excluded from Discord search/read tools, event logging, and message mirror/backfill.
Add to data/config.json:
{
"database": {
"type": "sqlite"
},
"embeddings": {
"enabled": true,
"provider": "builtin",
"model": "Xenova/all-MiniLM-L6-v2",
"dimensions": 384,
"batchSize": 50
}
}Database backends: sqlite (default), postgres (requires DATABASE_URL env var)
Embedding providers:
| Provider | Model (default) | Dims | Notes |
|---|---|---|---|
builtin |
Xenova/all-MiniLM-L6-v2 |
384 | Runs in-process, no API key, fully offline |
openai |
text-embedding-3-small |
1536 | Best quality, requires API key |
openrouter |
openai/text-embedding-3-small |
1536 | Many models via one key |
local |
nomic-embed-text |
768 | Any OpenAI-compatible endpoint (Ollama, LocalAI) |
docker-compose up -d --build
docker-compose logs -fdocker-compose -f docker-compose.postgres.yml up -d --build
docker-compose -f docker-compose.postgres.yml logs -fnpm install
npm run build
npm startFor development with hot-reload: npm run dev
src/
bot/ Core Discord client, message handling, conversation management
client.ts Discord client setup, event handlers, slash commands
message-handler.ts Message pipeline: history -> prompt -> LLM -> tools -> reply
tool-executor.ts Tool execution with caching and security gate
conversation.ts Two-tier conversation manager (memory + SQLite)
system-prompt.ts System prompt assembly with memory and skill filtering
approval-gate.ts User confirmation for destructive actions
agents/ Specialized agent system
agent-executor.ts Sub-agent and hand-over execution with planning step
agent-manager.ts Agent CRUD, wizard sessions, handover tracking
creation-wizard.ts Guided agent creation interview
sub-agents/ Spawnable background agents
sub-agent-manager.ts Sub-agent lifecycle management (spawn, track, retrieve)
execution.ts Sub-agent executor with tool blacklist and safety limits
workflows/ Workflow automation engine
engine.ts Graph-based workflow execution with node traversal
manager.ts Workflow CRUD, event listener wiring, run logging
template.ts Handlebars-like template rendering
nodes/ Node executors (trigger, condition, action, transform, control)
events/ Event-driven reaction system
event-bus.ts Central pub/sub event routing with filter matching
listener-manager.ts Event listener CRUD and database persistence
sources/ Discord, webhook, WebSocket, REST poll, cron event sources
llm/ Multi-provider LLM abstraction
providers/ OpenAI, Anthropic, OpenRouter, OpenAI-compatible
prompt-optimizer.ts Lazy skill descriptions, prompt size reduction
model-hints.ts Model-specific tool-use instructions
provider-cache.ts Per-user model overrides
credit-monitor.ts OpenRouter balance tracking
skills/ Modular skill framework
registry.ts Skill registration and tool dispatch
built-in/ 22 built-in skills
integrations/ External service connectors
api-caller.ts Custom API calling with SSRF protection and rate limiting
rss-monitor.ts RSS/Atom feed polling and posting
webhook-server.ts Inbound webhook receiver with HMAC verification
voice/ Voice and audio processing
tts.ts Text-to-speech engine (OpenAI, Google, ElevenLabs, local)
transcriber.ts Speech-to-text engine (Whisper API, Google, local)
connection-manager.ts Discord voice channel join/leave/speak/listen
meeting-mode.ts Meeting transcription and summary
voice-assistant.ts Interactive voice assistant with wake word detection
security/ Security observer and access control
observer.ts Tiered security review (exempt/rule-based/cached LLM/full LLM)
risk-tiers.ts Tool risk classification map
rate-limiter.ts Per-user per-tool rate limiting
cache/ Performance caching layer
tool-cache.ts Tool result cache with TTL and invalidation
response-cache.ts Response deduplication cache
database/ Database abstraction (SQLite + PostgreSQL)
adapter.ts DatabaseAdapter interface
adapters/ SQLite (better-sqlite3) and PostgreSQL (pg) adapters
migrations.ts Sequential SQLite schema migrations (v1-v8)
migrations-postgres.ts PostgreSQL schema migrations with tsvector FTS
connection.ts Database initialization and type detection
conversation-store.ts Persistent conversation CRUD
memory-store.ts Semantic memory with FTS5 search
message-mirror.ts Local Discord message mirror with audit trail + search + semantic search
message-backfill.ts Background channel history crawler (incl. forum threads)
embeddings.ts Embedding service (OpenAI, OpenRouter, local API, built-in Transformers.js)
embedding-worker.ts Background worker for generating message embeddings
migrate-to-postgres.ts SQLite -> PostgreSQL data migration tool
identity/ Bot personality and memory files
config/ Zod-validated configuration, runtime /config command helpers
logging/ Winston-based structured logging
utils/ Shared utilities and constants
types/ TypeScript type declarations
data/ Runtime data (database, identity files, user data, model cache)
skills/ User-created hot-loadable skills
npm run build # Compile TypeScript -> dist/
npm run dev # Run with hot-reload (tsx watch)
npm start # Run compiled dist/index.js
npm run typecheck # Type check without emittingBuilt for Discord community management and AI experimentation.