AI-Memory Monitoring Guide

Overview

AI-Memory uses Prometheus for metrics collection and Grafana for visualization. All metrics follow the naming convention: aimemory_{component}_{metric}_{unit} per BP-045.

Quick Access

Service	URL	Credentials
Grafana	`http://localhost:23000`	See `~/.ai-memory/docker/.env` (`GRAFANA_ADMIN_PASSWORD`)
Prometheus	`http://localhost:29090`	-
Pushgateway	`http://localhost:29091`	-

Note: Grafana uses a generated admin password created during installation. The password is stored in ~/.ai-memory/docker/.env under the key GRAFANA_ADMIN_PASSWORD. The username remains admin.

Dashboards

NFR Performance Overview (V3)

Purpose: Monitor all 6 Non-Functional Requirements

Key Panels:

NFR Status Summary: Current p95 vs target for each NFR
Hook Latency by Type: Breakdown by hook_type
Embedding Latency: Batch vs Realtime comparison
SLO Compliance: Percentage meeting each NFR target

Thresholds:

Status	Condition
Green	<80% of target
Yellow	80-100% of target
Red	>100% of target

Hook Activity (V3)

Purpose: Monitor hook execution patterns

Key Panels:

Execution rate by hook_type
CAPTURE vs RETRIEVAL comparison
Success/Error rate
Latency heatmap
Keyword trigger activity

Hook Types:

CAPTURE (store to memory):

Hook	Trigger	Purpose
`UserPromptSubmit`	User message	Capture user prompts
`Stop`	Agent complete	Capture agent responses
`PostToolUse`	Edit/Write	Capture code patterns
`PostToolUse_Error`	Bash error	Capture error patterns
`PreCompact`	Before compact	Save session summary

RETRIEVAL (query memory):

Hook	Trigger	Purpose
`PostToolUse_ErrorDetection`	Bash error	Retrieve error fixes
`PreToolUse_NewFile`	Write new file	Retrieve conventions
`PreToolUse_FirstEdit`	First edit	Retrieve file patterns
`SessionStart`	resume/compact	Inject context
`PreToolUse`	Any tool use	Retrieve best practices

Memory Operations (V3)

Purpose: Monitor storage and retrieval operations

Key Panels:

Captures/Retrievals by collection
Deduplication statistics
Token usage
Storage by project

Collections:

Collection	Purpose
`discussions`	Session summaries, decisions
`code-patterns`	Implementation patterns, error fixes
`conventions`	Rules, guidelines, naming conventions

System Health (V3)

Purpose: Infrastructure monitoring

Key Panels:

Service status (Qdrant, Embedding, Prometheus, Pushgateway)
Error rates by component
Embedding service latency
Queue size

Metrics Reference

Hook Metrics

Metric	Type	Labels	Description
`aimemory_hook_duration_seconds`	Histogram	hook_type, status, project	Hook execution time
`aimemory_captures_total`	Counter	hook_type, status, project, collection	Storage operations
`aimemory_retrievals_total`	Counter	collection, status, project	Retrieval operations

NFR Metrics

Metric	NFR	Target
`aimemory_hook_duration_seconds`	NFR-P1	<500ms
`aimemory_embedding_batch_duration_seconds`	NFR-P2	<2s
`aimemory_session_injection_duration_seconds`	NFR-P3	<3s
`aimemory_dedup_check_duration_seconds`	NFR-P4	<100ms
`aimemory_retrieval_query_duration_seconds`	NFR-P5	<500ms
`aimemory_embedding_realtime_duration_seconds`	NFR-P6	<500ms

Trigger Metrics

Metric	Type	Labels	Description
`aimemory_trigger_fires_total`	Counter	trigger_type, status, project	Keyword trigger activations
`aimemory_trigger_results_returned`	Histogram	trigger_type, project	Results per trigger

Other Metrics

Metric	Type	Labels	Description
`aimemory_tokens_consumed_total`	Counter	operation, direction, project	Token usage tracking
`aimemory_collection_size`	Gauge	collection, project	Memory count per collection
`aimemory_queue_size`	Gauge	status	Retry queue depth
`aimemory_failure_events_total`	Counter	component, error_code, project	Failure tracking for alerts

Alerting (Future)

Recommended alert thresholds:

Alert	Condition
NFR violation	p95 > target for 5 minutes
Service down	Any service unavailable for 1 minute
Error rate spike	>5% errors in 5 minutes
Queue backup	>100 items for 10 minutes

Troubleshooting

Metrics Not Appearing

Check Pushgateway is receiving metrics:

curl http://localhost:29091/metrics | grep aimemory

Check Prometheus is scraping:

curl http://localhost:29090/api/v1/targets

Verify hooks are pushing:

tail -f ~/.ai-memory/logs/hooks.log | grep push_

Dashboard Shows No Data

Verify time range matches when hooks fired
Check project filter matches actual project names
Ensure hook_type filter includes active hooks

Common Issues

Symptom	Cause	Solution
All panels empty	Wrong time range	Set to "Last 1 hour" or when hooks ran
Project dropdown empty	No metrics pushed yet	Execute a hook (e.g., edit a file)
Some NFRs missing	Metric not triggered	That operation hasn't occurred yet

Migration from V2

V3 dashboards use new metric names:

V2 Metric	V3 Metric
`ai_memory_*`	`aimemory_*`
Shared NFR metrics	Separate NFR metrics
No project label	Project label on all metrics

V2 dashboards remain available during migration period.

Tech Debt: Remove V2 dashboards after verifying V3 works (TECH-DEBT-125)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AI-Memory Monitoring Guide

Overview

Quick Access

Dashboards

NFR Performance Overview (V3)

Hook Activity (V3)

Memory Operations (V3)

System Health (V3)

Metrics Reference

Hook Metrics

NFR Metrics

Trigger Metrics

Other Metrics

Alerting (Future)

Troubleshooting

Metrics Not Appearing

Dashboard Shows No Data

Common Issues

Migration from V2

Related Documentation

Uh oh!

FilesExpand file tree

MONITORING.md

Latest commit

History

MONITORING.md

File metadata and controls

AI-Memory Monitoring Guide

Overview

Quick Access

Dashboards

NFR Performance Overview (V3)

Hook Activity (V3)

Memory Operations (V3)

System Health (V3)

Metrics Reference

Hook Metrics

NFR Metrics

Trigger Metrics

Other Metrics

Alerting (Future)

Troubleshooting

Metrics Not Appearing

Dashboard Shows No Data

Common Issues

Migration from V2

Related Documentation