AI-Memory uses Prometheus for metrics collection and Grafana for visualization. All metrics follow the naming convention: aimemory_{component}_{metric}_{unit} per BP-045.
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:23000 |
See ~/.ai-memory/docker/.env (GRAFANA_ADMIN_PASSWORD) |
| Prometheus | http://localhost:29090 |
- |
| Pushgateway | http://localhost:29091 |
- |
Note: Grafana uses a generated admin password created during installation. The password is stored in
~/.ai-memory/docker/.envunder the keyGRAFANA_ADMIN_PASSWORD. The username remainsadmin.
Purpose: Monitor all 6 Non-Functional Requirements
Key Panels:
- NFR Status Summary: Current p95 vs target for each NFR
- Hook Latency by Type: Breakdown by hook_type
- Embedding Latency: Batch vs Realtime comparison
- SLO Compliance: Percentage meeting each NFR target
Thresholds:
| Status | Condition |
|---|---|
| Green | <80% of target |
| Yellow | 80-100% of target |
| Red | >100% of target |
Purpose: Monitor hook execution patterns
Key Panels:
- Execution rate by hook_type
- CAPTURE vs RETRIEVAL comparison
- Success/Error rate
- Latency heatmap
- Keyword trigger activity
Hook Types:
CAPTURE (store to memory):
| Hook | Trigger | Purpose |
|---|---|---|
UserPromptSubmit |
User message | Capture user prompts |
Stop |
Agent complete | Capture agent responses |
PostToolUse |
Edit/Write | Capture code patterns |
PostToolUse_Error |
Bash error | Capture error patterns |
PreCompact |
Before compact | Save session summary |
RETRIEVAL (query memory):
| Hook | Trigger | Purpose |
|---|---|---|
PostToolUse_ErrorDetection |
Bash error | Retrieve error fixes |
PreToolUse_NewFile |
Write new file | Retrieve conventions |
PreToolUse_FirstEdit |
First edit | Retrieve file patterns |
SessionStart |
resume/compact | Inject context |
PreToolUse |
Any tool use | Retrieve best practices |
Purpose: Monitor storage and retrieval operations
Key Panels:
- Captures/Retrievals by collection
- Deduplication statistics
- Token usage
- Storage by project
Collections:
| Collection | Purpose |
|---|---|
discussions |
Session summaries, decisions |
code-patterns |
Implementation patterns, error fixes |
conventions |
Rules, guidelines, naming conventions |
Purpose: Infrastructure monitoring
Key Panels:
- Service status (Qdrant, Embedding, Prometheus, Pushgateway)
- Error rates by component
- Embedding service latency
- Queue size
| Metric | Type | Labels | Description |
|---|---|---|---|
aimemory_hook_duration_seconds |
Histogram | hook_type, status, project | Hook execution time |
aimemory_captures_total |
Counter | hook_type, status, project, collection | Storage operations |
aimemory_retrievals_total |
Counter | collection, status, project | Retrieval operations |
| Metric | NFR | Target |
|---|---|---|
aimemory_hook_duration_seconds |
NFR-P1 | <500ms |
aimemory_embedding_batch_duration_seconds |
NFR-P2 | <2s |
aimemory_session_injection_duration_seconds |
NFR-P3 | <3s |
aimemory_dedup_check_duration_seconds |
NFR-P4 | <100ms |
aimemory_retrieval_query_duration_seconds |
NFR-P5 | <500ms |
aimemory_embedding_realtime_duration_seconds |
NFR-P6 | <500ms |
| Metric | Type | Labels | Description |
|---|---|---|---|
aimemory_trigger_fires_total |
Counter | trigger_type, status, project | Keyword trigger activations |
aimemory_trigger_results_returned |
Histogram | trigger_type, project | Results per trigger |
| Metric | Type | Labels | Description |
|---|---|---|---|
aimemory_tokens_consumed_total |
Counter | operation, direction, project | Token usage tracking |
aimemory_collection_size |
Gauge | collection, project | Memory count per collection |
aimemory_queue_size |
Gauge | status | Retry queue depth |
aimemory_failure_events_total |
Counter | component, error_code, project | Failure tracking for alerts |
Recommended alert thresholds:
| Alert | Condition |
|---|---|
| NFR violation | p95 > target for 5 minutes |
| Service down | Any service unavailable for 1 minute |
| Error rate spike | >5% errors in 5 minutes |
| Queue backup | >100 items for 10 minutes |
-
Check Pushgateway is receiving metrics:
curl http://localhost:29091/metrics | grep aimemory -
Check Prometheus is scraping:
curl http://localhost:29090/api/v1/targets
-
Verify hooks are pushing:
tail -f ~/.ai-memory/logs/hooks.log | grep push_
- Verify time range matches when hooks fired
- Check project filter matches actual project names
- Ensure hook_type filter includes active hooks
| Symptom | Cause | Solution |
|---|---|---|
| All panels empty | Wrong time range | Set to "Last 1 hour" or when hooks ran |
| Project dropdown empty | No metrics pushed yet | Execute a hook (e.g., edit a file) |
| Some NFRs missing | Metric not triggered | That operation hasn't occurred yet |
V3 dashboards use new metric names:
| V2 Metric | V3 Metric |
|---|---|
ai_memory_* |
aimemory_* |
| Shared NFR metrics | Separate NFR metrics |
| No project label | Project label on all metrics |
V2 dashboards remain available during migration period.
Tech Debt: Remove V2 dashboards after verifying V3 works (TECH-DEBT-125)
- prometheus-queries.md - Example PromQL queries
- Core-Architecture-Principle-V2.md - Section 9: NFR Requirements
- BP-045-prometheus-metrics.md - Naming conventions