This document explains the different memory systems used by the LetheAISharp library to extend the persona's memory and knowledge. All memory systems are unified under the MemoryUnit format and can be triggered either manually through keywords or automatically through the embedding similarity search.
Note that most, if not all those systems are automatically handled by the library when in full chat mode. This is mostly to explain how things behave internally, especially as many classes and functions can be overriden to add more functionalities.
The LetheAISharp library employs three primary memory systems that work together to provide comprehensive memory management for personas:
- Chat Session Summaries - Automatic summarization and embedding of past conversations
- WorldInfo System - Manual keyword-activated knowledge databases. It's the application's job to load them into the BasePersona.Worlds field.
- Brain/Agent System - Dynamic research and memory creation through agent tasks
All memory types are stored using the unified MemoryUnit format, which provides consistent handling, embedding, and retrieval across the entire system.
The persona's Brain class provides methods to manage memories, including adding new entries, forgetting old ones, and retrieving relevant information based on user input.
- MemoryUnit Format
- Chat Session Summaries
- WorldInfo System
- Agent System
- RAG Integration
- Known Facts (Extracted Facts)
- Memory Insertion Strategies
The MemoryUnit class is the unified format for all memory and knowledge storage in the library. Every piece of information - whether it's a chat summary, world knowledge, or research data - is stored as a MemoryUnit.
| Property | Type | Purpose |
|---|---|---|
Guid |
Guid |
Unique identifier for the memory entry |
Category |
MemoryType |
Type of memory (General, WorldInfo, WebSearch, ChatSession, etc.) |
Insertion |
MemoryInsertion |
How the memory is inserted (Trigger, Natural, None) |
Name |
string |
Title or name for the memory entry |
Content |
string |
The actual memory content |
Reason |
string |
Context or reason why this memory is important (optional) |
EmbedSummary |
float[] |
Vector embedding for RAG similarity search |
Priority |
int |
Importance level of the memory (affects retention and triggering) |
- General - Basic memories and observations
- WorldInfo - Manual knowledge entries with keyword triggers
- WebSearch - Research results from web searches
- ChatSession - Summarized conversation history
- Journal - Personal notes and reflections
- Image, File - Media and document references
- Location, Event, Person, Goal - Categorized memories for specific types
This library makes use of WorldInfo, WebSearch, and ChatSession types. The remaining types can be used by your app when you want to feed the persona with external information. The type doesn't have to 1:1 match the function.
- Trigger - Memory is activated by RAG similarity in user input. It stays in the prompt for "Duration" (unless it's a chat session, which is always 1)
- Natural - Automatically inserted when relevant just before the user input message, it behaves like a normal message and scrolls until it gets out of the context window. It's converted to Trigger after use.
- NaturalForced - Will be forcefully injected into the prompt (no RAG check) in a timed fashion and behave like Natural memory otherwise. May lead to jarring disconnect during conversation
- UserReturn - Will be inserted as part of the system message generated when the user comes back after being AFK. Memory is set to None after use to avoid re-insertion on subsequent returns. Content will be inserted "as is" with no title, it is recommended to keep it brief.
- None - Disabled memory
In all cases, the "Added" field is checked, meaning that if the memory's Added field is set in the future, this memory won't surface until the date is correct. This allows for features like future planning, and make calendar type applications easier to build.
The summary of the chat sessions just before the current one can be inserted in the system prompt for long term contextual awareness. The behavior can be adjusted through the LLMEngine.Settings:
- SessionMemorySystem - true/false (allow or disallow the behavior entirely)
- SessionReservedTokens - The maximum amount of tokens you want to reserve for the feature
- SessionHandling - How to handle the chatlog itself
- CurrentOnly - The chatlog will only contain the current session, with the previous ones summarized in system prompt
- FitAll - The chatlog will feature as much log as possible (even across multiple sessions) depending on maximum Context Size, sessions coming before all this will be inserted in system prompt
Older chat session can be retrieved through RAG. Their summary are compared to the user's input, for embedding distance / similarity. If judged relevant enough, those can be inserted into the prompt at different levels under different policies.
When a chat session ends, which is controlled by your app through the following command:
LLMSystem.Bot.History.StartNewChatSession()the library processes the chat to turn it into useful data:
-
Automatic Summarization:
- Session title, summary and metadata
- Key topics discussed
- Character interactions and developments
- Future goals mentioned
- Roleplay status detection
-
Embedding Creation: The summary is converted to vector embeddings for similarity search
-
RAG Activation: When users or personas mention topics related to past conversations, the persona's
Brainautomatically retrieves relevant session summaries
Any chat session can be (re)processed individually, however it can take a few minutes depending on the backend, model, and processing power.
// Automatic session update process
await session.UpdateSession();
// This creates:
// - MetaData.Summary (detailed summary)
// - MetaData.Keywords (key topics)
// - MetaData.FutureGoals (mentioned objectives)
// - EmbedSummary (vector embedding)Instead when the user or app decides that a chat session has ended, call:
LLMSystem.Bot.History.StartNewChatSession()It'll automatically process and archive the current session before starting a new empty session.
Chat sessions are stored as MemoryUnit objects with:
- Category:
MemoryType.ChatSession - Insertion:
MemoryInsertion.Trigger(RAG-activated) - Name: Session title
- Content: Detailed session summary
- EmbedSummary: Vector representation for similarity search
The WorldInfo system provides manually curated knowledge databases that can be activated through keyword matching or RAG similarity search.
- Manual Creation: Developers or users create knowledge entries with specific keywords
- Keyword Activation: Entries are triggered when conversation contains matching keywords
- Optional RAG Integration: When
DoEmbedsis enabled, entries also participate in similarity search - Duration Control: Activated entries remain active for a specified number of conversation turns
public class WorldInfo
{
public string Name { get; set; } // World database name
public string Description { get; set; } // Purpose description
public bool DoEmbeds { get; set; } // Enable RAG integration
public int ScanDepth { get; set; } // Messages to scan for keywords
public List<MemoryUnit> Entries { get; set; } // Knowledge entries
}Each WorldInfo entry is a MemoryUnit with additional keyword settings:
- KeyWordsMain: Primary trigger keywords
- KeyWordsSecondary: Secondary trigger keywords
- WordLink: Logic for keyword matching (And, Or, Not)
- Duration: How many turns the entry stays active
- TriggerChance: Probability of activation when keywords match
- And: Requires keywords from both Main and Secondary lists
- Or: Requires keywords from either Main or Secondary lists
- Not: Requires Main keywords but not Secondary keywords
The Agent system provides dynamic memory creation and research capabilities through autonomous agent tasks. For comprehensive documentation on the agent system, see AGENTS.md.
- Topic Analysis: Agent tasks (if enabled) analyze recent conversations to identify unfamiliar topics
- Automatic Research: When
AgentSystemis enabled and research tasks are assigned, the system:- Detects knowledge gaps in conversations
- Performs web searches on unfamiliar topics
- Merges search results into coherent summaries
- Stores findings as new memories
- Memory Integration: Research results are automatically added to the persona's memory with appropriate categorization
// Example from ActiveResearchTask
var mem = new MemoryUnit
{
Category = MemoryType.WebSearch,
Insertion = MemoryInsertion.Natural,
Name = topic.Topic,
Content = merged.CleanupAndTrim(),
Reason = topic.Reason,
Priority = topic.Urgency + 1
};
await mem.EmbedText();
owner.Brain.Memorize(mem);- ResearchTask: Analyzes archived sessions for research opportunities
- ActiveResearchTask: Performs real-time research during active conversations
- Both tasks create memories with
MemoryType.WebSearchcategory
The persona's Brain class provides memory management and RAG functionalities:
- Memorize(): Add new memories with duplicate checking. This is particularly useful if your app intends to feed the persona with external information.
- Forget(): Remove specific memories
- ReloadMemories(): Rebuild the RAG index from current memories (to be called after adding or forgetting many memories)
- GetRAGandInserts(): Get a list of RAG and WorldInfo inserts that would be triggered by a given message
- Search(): Perform RAG similarity search on stored memories
- Embedding Creation: All memory content is converted to vector embeddings
- Similarity Search: When processing user input, the system:
- Embeds the current message
- Searches for similar memories using vector distance
- Returns the most relevant memories within a distance threshold
- Automatic Insertion: Retrieved memories are automatically inserted into the conversation context
if (LLMEngine.Settings.RAGEnabled)
{
var searchresults = await LLMEngine.Bot.Brain.Search(searchmessage, ragResCount, ragDistance);
// results is a list of VaultResult (containing each a MemoryUnit and a distance value)
}Memories participate in RAG search when:
Insertionis set toMemoryInsertion.TriggerEmbedSummarycontains valid embedding dataEnabledis true
By default, this includes:
- Summaries of past chat sessions
- WorldInfo entries with
DoEmbedsproperty enabled - Research results from agent tasks after they've been inserted into conversation
The ExtractedFact system is a lightweight semantic index that sits on top of the regular RAG memory vault. Rather than embedding long session summaries (which cover many topics and can be hard to match precisely), the brain extracts concise, single-sentence facts from each session and embeds those instead.
When a session is processed, short facts are extracted from it via structured output — statements like "User works as a software engineer" or "User's cat is named Whiskers." Each fact is embedded independently and stored in Brain.ExtractedFacts.
At retrieval time, two-hop retrieval is used:
- The user's message is compared against all fact embeddings.
- Any fact whose embedding is close enough (within
FactRetrievalThreshold) has itsSourceMemoriesGUIDs used to directly load the originatingMemoryUnitobjects (typically full session summaries). - Those sessions are then injected into the prompt, bypassing the standard vector distance check on the session summaries themselves.
This dramatically improves recall for personal facts that span many topics in a session, because "does the user mention their cat?" resolves through a tiny, focused embedding rather than a sprawling multi-topic summary.
| Property | Type | Purpose |
|---|---|---|
Guid |
Guid |
Unique identifier |
Fact |
string |
The concise single-sentence fact |
EmbedSummary |
float[] |
Embedding vector for similarity search |
FirstSeen |
DateTime |
When the fact was first extracted |
LastSeen |
DateTime |
When the fact was last confirmed (updated on dedup) |
ReferenceCount |
int |
How many times this fact has been seen across sessions |
SourceMemories |
List<Guid> |
GUIDs of MemoryUnit objects this fact was extracted from |
Superseded |
bool |
true if a newer fact replaced this one |
SupersededBy |
Guid? |
GUID of the superseding fact, if any |
To keep the fact list from growing unbounded, the system uses cosine-distance comparison when a new fact arrives:
- Duplicate (distance ≤
FactDeduplicationThreshold): same fact — updateLastSeenandReferenceCount. - Supersession (
FactDeduplicationThreshold< distance ≤FactSupersessionThreshold): related but updated fact — mark the old fact asSuperseded, carry forwardSourceMemories. - New fact (distance >
FactSupersessionThreshold): stored as a brand-new entry.
In addition to driving RAG retrieval, the highest-ranked facts are injected directly into the system prompt on every turn. Ranking is based on ReferenceCount × recency_factor (recency decays slowly over time). The number of facts included is controlled by CoreFactsTokenBudget in LLMSettings. The section title in the system prompt is configured via SystemPrompt.CoreFactsTitle.
| Setting | Default | Purpose |
|---|---|---|
FactRetrievalEnabled |
true |
Enable/disable the entire fact layer |
CoreFactsTokenBudget |
512 |
Token budget for facts in the system prompt |
FactDeduplicationThreshold |
0.05 |
Distance for treating two facts as identical |
FactSupersessionThreshold |
0.075 |
Distance for treating a fact as superseding another |
FactRetrievalThreshold |
0.10 |
Distance for fact-triggered memory retrieval |
- Activation: Memories are inserted when triggered by RAG similarity or keyword matching
- Use Case: Most common for stored knowledge and past conversations
- Behavior: Memories remain dormant until relevance is detected and disappear quickly from context when not
- Activation: Memories are automatically evaluated for relevance during conversation
- Conversion: After being used once, Natural memories become Trigger memories
- Use Case: Fresh insights or temporary information
- Behavior: Inserted like a system message (just above the last user message), scrolls out of context over time
- Activation: Memories are automatically inserted independantly of context, but the system will prevent overload by spacing them by a few messages
- Conversion: After being used once, NaturalForced memories become Trigger memories
- Use Case: Critical messages that need to be addressed
- Behavior: Inserted like a system message (just above the last user message), scrolls out of context over time
- Activation: Automatically triggered when the user returns after being AFK
- Conversion: After being used once, UserReturn memories become None memories to prevent further recall
- Use Case: Important messages, topic conversation steering, calendar events
- Behavior: Integrated in the system message that would normally be generated there, scrolls out of context over time
- Creation: Memory is created with appropriate
MemoryTypeandInsertionstrategy - Embedding: Content is converted to vector embedding if RAG is enabled
- Storage: Memory is added to the Brain's memory collection
- Activation: Memory is triggered by keywords or similarity search
- Insertion: Memory content is injected into conversation context
- Evolution: Natural memories may convert to Trigger memories after use
- Decay: Some memory types (Goals and WebSearch by default) will decay and be pruned if not triggered for a long time.
Here's an example of what a full prompt might look like when having all memory systems enabled and working together. Here, the user archived the last chat session a while back and is starting a new one. The program was kept running in the background. Bob is running the two ActiveResearchTask and ResearchTask agentic task, allowing it to do web research while the user is AFK.
[SystemPrompt]
You are a helpful assistant named Bob.
# Participants:
- Bob: A knowledgeable and friendly AI assistant who can remember a great deal
- User: Some user who knows very little
# Past sessions: // Automatically inserted by the bot in the system prompt, those are former chat sessions summaries (if any and if they no longer fit in context)
- Session Title 1 (2025-05-16): Discussed the basics of AI and machine learning. User learned about supervised and unsupervised learning.
- Session Title 2 (2025-06-02): Discussed the meaning of life, we ended up figuring out it was 42
# Relevant Information: // Those are RAG triggered information from either past sessions, WorldInfo, or other tasks with their memory entries set to Trigger
- some relevant info from a worldinfo entry about the last user message
- some past chat session that's too old for the previous section but very relevant to the discussion at hand
[/SystemPrompt]
[Bot]
Hello! How can I assist you today?
[/Bot]
[User]
Hi Bob, did you look into the meaning of life we discussed last time?
[/User]
In this example, assuming that Bob got time to do the research in the background while the user was afk. The prompt becomes:
[SystemPrompt]
(no changes)
[/SystemPrompt]
[Bot]
Hello! How can I assist you today?
[/Bot]
[SystemMessage] // those messages are automatically inserted by the bot just above the last user message, they scroll out of context over time and are tagged as "hidden", meaning that they don't have to be displayed in the UI
Bob has found the following information on the internet about the meaning of life:
(lot of text here, summarized from various sources)
[/SystemMessage]
[User]
Hi Bob, Did you look into the meaning of life we discussed last time?
[/User]
[Bot]
Oh yeah, I did! Here's the info I found.... (proceeds to reuse the info from the SystemMessage above in its own words)
[/Bot]
Here the Brain class found that the user's query was very close to one of the memories it had with the Natural trigger, so it inserted it just above the user message as a system message. The bot then can then reuse that information in its own words.
- Set realistic
Prioritylevels to control memory retention (0 priority means that a natural memory will be pruned immediately after use) - Configure
Durationappropriately for WorldInfo entries (2-5 turns is typical) - Enable
DoEmbedsfor WorldInfo when RAG integration is desired - Write clear, concise
Contentthat provides context without being verbose, aim for less than 1024 tokens (only first 1024 tokens are considered for RAG) - Use descriptive
Namefields for better identification - The
Reasoninformation is optional, and mostly used for more natural intertion into the prompt - Choose appropriate
Insertionstrategies based on memory importance and usage patterns - While extremely optimized, RAG searches are computationally expensive, going over >100k entries might take several seconds.
- WorldInfo keyword matching is faster than RAG similarity search for specific triggers