Episodic memory provides session-level summaries that capture what happened during a conversation or work session.
- What: High-level summaries of sessions (topic/actions/outcomes, or bullet points)
- When: Generated manually via API, or auto-triggered during
observe_turn - Storage: Stored as
EPISODICmemory type with structured metadata - Retrieval: Automatically included in cross-session memory retrieval
POST /v1/sessions/{session_id}/summary
Content-Type: application/json
Authorization: Bearer <token>
{
"mode": "full", // "full" (topic/action/outcome) or "lightweight" (3-5 bullet points)
"sync": true, // true = wait for result, false = async with task_id (default)
"generate_embedding": true // false = skip embedding (faster, but not retrievable)
}Modes:
full: Generates structured topic/action/outcome summary. No rate limit.lightweight: Generates 3-5 concise bullet points. Rate-limited to 3 per session.
Sync Response (sync=true, mode=full):
{
"memory_id": "abc123...",
"content": "Session Summary: Database optimization\n\nActions: ...\n\nOutcome: ...",
"truncated": false,
"mode": "full",
"metadata": {
"topic": "Database optimization",
"action": "Analyzed queries, added indexes",
"outcome": "Query time reduced 93%",
"source_event_ids": ["mem1", "mem2"]
}
}Sync Response (sync=true, mode=lightweight):
{
"memory_id": "abc123...",
"content": "Session Highlights:\n• Optimized DB queries\n• Added indexes\n• 93% faster",
"truncated": false,
"mode": "lightweight",
"metadata": {
"mode": "lightweight",
"points": ["Optimized DB queries", "Added indexes", "93% faster"]
}
}Async Response (sync=false):
{
"task_id": "task_abc123",
"truncated": false,
"mode": "full"
}GET /v1/tasks/{task_id}
Authorization: Bearer <token>Response:
{
"task_id": "task_abc123",
"status": "completed", // "processing" | "completed" | "failed"
"created_at": "2026-03-15T10:00:00Z",
"updated_at": "2026-03-15T10:00:03Z",
"result": { ... },
"error": null
}When using POST /v1/observe with session_id and turn_count, Memoria can automatically generate lightweight summaries at configurable intervals.
Configure via environment variables:
MEMORIA_AUTO_TRIGGER_THRESHOLD=20— trigger every N turns (0 = disabled, default: 20)MEMORIA_MAX_LIGHTWEIGHT_PER_SESSION=3— max auto-triggered summaries per session (default: 3)
Auto-triggered summaries are stored with extra_metadata.auto_triggered=True.
Episodic memory generation (POST /v1/sessions/{id}/summary) requires LLM configuration. Without it, the endpoint returns HTTP 503:
{"detail": "LLM not configured — episodic memory generation unavailable"}Configure via environment variables:
export LLM_API_KEY="sk-..."
export LLM_BASE_URL="https://api.openai.com/v1" # optional, defaults to OpenAI
export LLM_MODEL="gpt-4o-mini" # optionalAny OpenAI-compatible endpoint works (SiliconFlow, DeepSeek, Ollama, etc.).
- Task storage: In-memory (tasks lost on restart, use
sync=truefor critical sessions) - Multi-process: Task status only visible on the process that created it
- Lightweight rate limit: Max 3 per session (configurable via
max_lightweight_per_session) - Input limits:
fullmode: max 200 messages / 16K tokens;lightweight: max 50 messages / 4K tokens
HTTP 503 "LLM not configured": Set LLM_API_KEY environment variable and restart the server
"No memories found": Session has no memories, or wrong session_id
"Rate limit exceeded": Lightweight mode limited to 3/session — use mode="full" instead
Episodic memory not retrieved: Ensure stored with session_id=None for cross-session retrieval