fix(memory): localize background LLM prompts by sunilkumarvalmiki · Pull Request #2447 · tinyhumansai/openhuman

sunilkumarvalmiki · 2026-05-21T13:18:08Z

Summary

Adds a top-level output_language config/env override for background LLM artifacts.
Applies the directive to memory-tree summaries, memory-tree entity extraction natural-language values, archivist recaps, and learning reflection prompts.
Keeps JSON keys, enum values, code, commands, quoted text, and proper nouns stable while localizing generated prose.
Documents OPENHUMAN_OUTPUT_LANGUAGE in .env.example.

Problem

#2436 reports that background LLM surfaces are hardcoded to English even when chat/UI localization is configured. The affected paths build their own prompts and do not read the web chat locale directive or profile context:

memory-tree summariser
memory-tree entity extractor
learning reflection

Solution

This adds a small shared background output-language directive in config:

Config.output_language can be set in TOML or via OPENHUMAN_OUTPUT_LANGUAGE.
Locale tags such as zh-CN are normalized to language names; custom language names are accepted after trimming/control-character cleanup.
The summariser/extractor configs carry the language into their prompt builders.
ReflectionHook reads the directive from its existing full config.
Tests assert the directive appears in each affected prompt surface.

Submission Checklist

Tests added or updated (happy path + edge case) per Testing Strategy.
Diff coverage >= 80% - focused Rust tests cover the changed prompt/config surfaces; the CI coverage gate is running and remains authoritative for exact diff coverage.
Coverage matrix updated - N/A: background config/prompt behavior only, no feature matrix row change.
All affected feature IDs from the matrix are listed - N/A: no feature matrix row changed.
No new external network dependencies introduced.
Manual smoke checklist updated - N/A: no release-cut surface.
Linked issue closed via Closes #2436 in Related.

Impact

Runtime: Rust core only.
Compatibility: default behavior is unchanged unless output_language / OPENHUMAN_OUTPUT_LANGUAGE is set.
Migration: no schema or data migration required; existing configs deserialize with output_language = None.
Security/privacy: no new network dependency; directive explicitly preserves JSON keys, enum values, code, commands, quoted source text, and proper nouns.

Validation

Passed:

cargo fmt --manifest-path Cargo.toml --all --check
pnpm --filter openhuman-app rust:format:check
git diff --check
$env:CARGO_BUILD_JOBS='1'; cargo test --manifest-path Cargo.toml --lib output_language -- --nocapture (6 passed)

Blocked / unrelated:

pnpm format:check fails in the app Prettier pass with existing formatting drift across 931 app files. This branch does not change app/.

Summary by CodeRabbit

New Features
- Configure the language used for AI-generated background content (memory summaries, entity extraction, learning reflections) via the OPENHUMAN_OUTPUT_LANGUAGE environment variable. Accepts locale tags (e.g., zh-CN → Simplified Chinese) or human-readable names. Optional—leave unset to keep existing/default behavior.

coderabbitai · 2026-05-21T13:18:27Z

📝 Walkthrough

Walkthrough

Adds an optional output_language config and OPENHUMAN_OUTPUT_LANGUAGE env override; normalizes language tags and generates an output-language directive that is threaded into system prompts for memory-tree summariser, entity extractor, learning reflection, and archivist recap summarisation.

Changes

Output Language Configuration for Background LLM Artifacts

Layer / File(s)	Summary
Config schema and helper functions `src/openhuman/config/schema/types.rs`	`Config` gains `output_language: Option<String>`. Adds `normalize_output_language` and `output_language_directive` helpers, `Config::output_language_directive()` wrapper, default `None`, and unit tests for locale mapping and non-locale names.
Config environment loading `src/openhuman/config/schema/load.rs`, `src/openhuman/config/schema/load_tests.rs`	`Config::apply_env_overlay_with` reads `OPENHUMAN_OUTPUT_LANGUAGE`, trims it, and sets `output_language` when non-empty. New tests verify override behavior and whitespace-only handling.
Config module exports and environment documentation `src/openhuman/config/mod.rs`, `.env.example`	Reformat `pub use schema::{ ... }` for readability and add `.env.example` documentation for the new `OPENHUMAN_OUTPUT_LANGUAGE` variable with locale examples.
Memory-tree summarizer language support `src/openhuman/memory/tree/tree_source/summariser/llm.rs`, `src/openhuman/memory/tree/tree_source/summariser/mod.rs`	`LlmSummariserConfig` adds `output_language`; `build_prompt` and `system_prompt` accept and use it to prepend a language directive. Tests updated to assert directive presence while preserving existing prompt shapes and JSON markers.
Memory-tree entity extractor language support `src/openhuman/memory/tree/score/extract/llm.rs`, `src/openhuman/memory/tree/score/extract/mod.rs`, `src/openhuman/memory/tree/score/mod.rs`	`LlmExtractorConfig` adds `output_language`; `build_system_prompt` accepts `output_language` and prepends a language directive. Extractor construction passes `config.output_language` through. Tests updated accordingly.
Learning reflection language support `src/openhuman/learning/reflection.rs`, `src/openhuman/learning/reflection_tests.rs`	`build_reflection_prompt` conditionally appends an output-language directive before metadata sections. A unit test asserts directive inclusion and JSON-key preservation.
Archivist recap summarization `src/openhuman/agent/harness/archivist.rs`	`ArchivistHook::summarize_entries` passes `config.output_language` into `LlmSummariserConfig` for recap summarisation.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Config
  participant LlmSummariser
  participant system_prompt
  Caller->>Config: read output_language (env/setting)
  Caller->>LlmSummariser: construct config with output_language
  LlmSummariser->>system_prompt: build with output_language
  system_prompt->>system_prompt: prepend language directive if present
  system_prompt-->>LlmSummariser: prompt with directive

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A language whisper through the rabbit hole deep,
Where memory trees and reflections wake from sleep.
Now summaries, extracts, and reflections sing,
In zh-CN, en, or the language you bring! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(memory): localize background LLM prompts' accurately reflects the main objective—adding output language configuration to localize background LLM artifacts in memory and reflection modules.
Linked Issues check	✅ Passed	The PR implements option 2 from issue `#2436` by adding a global `output_language` config key consumed by all affected background LLM prompts (memory_tree summariser, entity extractor, reflection, archivist) with proper locale mapping and directive injection.
Out of Scope Changes check	✅ Passed	All changes directly support the objective of localizing background LLM prompts; environment variable, config schema, prompt builders, and tests are all cohesive to this goal with no unrelated modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

graycyrus

Clean, well-structured PR — good test coverage across all four prompt surfaces, sensible defaults (None = no change), and the control-character stripping in normalize_output_language is a nice defensive touch.

One minor observation below on the locale mapping table; everything else looks solid.

Signed-off-by: sunilkumarvalmiki <g.sunilkumarvalmiki@gmail.com>

coderabbitai

🧹 Nitpick comments (3)

src/openhuman/memory/tree/score/extract/llm_tests.rs (1)
21-28: 💤 Low value

Consider asserting threading, not directive wording.

This test duplicates assertions that live with the output_language_directive helper itself (e.g., the Simplified Chinese mapping and the Keep JSON keys / importance_reason phrasing). Any future rewording of the directive will break this test even though the extractor's contract — "prepend the directive when configured" — is intact.

Consider checking that the prompt starts with (or contains) the directive returned by the helper, leaving the wording assertions to the helper's own tests:
♻️ Suggested refactor
 #[test]
 fn build_system_prompt_includes_output_language_directive() {
-    let p = build_system_prompt(true, Some("zh-CN"));
-    assert!(p.contains("Simplified Chinese"));
-    assert!(p.contains("Keep JSON keys"));
-    assert!(p.contains("\"importance_reason\""));
+    let directive = crate::openhuman::config::output_language_directive(Some("zh-CN"))
+        .expect("zh-CN should produce a directive");
+    let p = build_system_prompt(true, Some("zh-CN"));
+    assert!(p.contains(&directive), "prompt should embed the language directive");
+    // The directive must appear before the extractor's role description so
+    // the model reads it first.
+    let directive_pos = p.find(&directive).unwrap();
+    let role_pos = p.find("You are a named-entity extractor").unwrap();
+    assert!(directive_pos < role_pos);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/tree/score/extract/llm_tests.rs` around lines 21 - 28,
The test build_system_prompt_includes_output_language_directive should assert
that build_system_prompt(true, Some("zh-CN")) contains (or starts with) the
exact directive produced by the existing helper output_language_directive(true,
Some("zh-CN")) rather than duplicating its wording; update the assertions to
call output_language_directive and check containment (e.g.,
assert!(p.contains(&output_language_directive(true, Some("zh-CN"))))) so the
test verifies that the directive is prepended without coupling to its text.
src/openhuman/memory/tree/tree_source/summariser/llm.rs (2)
315-320: 💤 Low value

Minor formatting glitch when output_language is set in plain-text mode.

In the non-structured path, language_directive is appended as \n\n{directive} and then immediately followed by a space + No commentary, no preamble... on the same line. When a directive is present this yields:
...Return the summary text first.

<directive text> No commentary, no preamble, no headings, ...
i.e. the trailing constraints run onto the same line as the directive. It's still readable, but a \n\n separator (matching the structured branch on line 323) would keep the prompt shape consistent and easier to grep.
✏️ Suggested tweak
     if !structured_facets {
         return format!(
-            "{base}{language_directive} No commentary, no preamble, no headings, \
-             no markdown wrappers, no JSON — just the prose summary."
+            "{base}{language_directive}\n\n\
+             No commentary, no preamble, no headings, no markdown wrappers, \
+             no JSON — just the prose summary."
         );
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/tree/tree_source/summariser/llm.rs` around lines 315 -
320, The non-structured return builds the prompt by concatenating base and
language_directive then immediately appending the constraint text, causing the
directive and constraints to share a line; update the format string in the
non-structured branch (the block using structured_facets and language_directive)
to insert a "\n\n" after {language_directive} (i.e.
"{base}{language_directive}\n\nNo commentary, no preamble, no headings, no
markdown wrappers, no JSON — just the prose summary.") so it matches the
structured branch's spacing.
513-519: ⚡ Quick win

Consider also covering the non-structured + language path.

The new test asserts the directive lands in the structured-facets prompt, but the plain-prose branch (line 315-320) is the other code path that interpolates language_directive and currently has no test covering Some(...). A second assertion (e.g. system_prompt(4096, false, Some("zh-CN")) contains "Simplified Chinese" and still contains "no json") would lock in both branches.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/tree/tree_source/summariser/llm.rs` around lines 513 -
519, Add coverage for the non-structured (plain-prose) branch of system_prompt
by invoking system_prompt with the structured flag set to false (e.g.,
system_prompt(4096, false, Some("zh-CN"))) and assert the returned prompt
contains the language directive ("Simplified Chinese") and the plain-prose
instruction marker (e.g., "no json" or "no JSON") so both code paths that
interpolate language_directive are tested; update or add a test function
alongside system_prompt_includes_output_language_directive that performs these
assertions against the system_prompt function.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/memory/tree/score/extract/llm_tests.rs`:
- Around line 21-28: The test
build_system_prompt_includes_output_language_directive should assert that
build_system_prompt(true, Some("zh-CN")) contains (or starts with) the exact
directive produced by the existing helper output_language_directive(true,
Some("zh-CN")) rather than duplicating its wording; update the assertions to
call output_language_directive and check containment (e.g.,
assert!(p.contains(&output_language_directive(true, Some("zh-CN"))))) so the
test verifies that the directive is prepended without coupling to its text.

In `@src/openhuman/memory/tree/tree_source/summariser/llm.rs`:
- Around line 315-320: The non-structured return builds the prompt by
concatenating base and language_directive then immediately appending the
constraint text, causing the directive and constraints to share a line; update
the format string in the non-structured branch (the block using
structured_facets and language_directive) to insert a "\n\n" after
{language_directive} (i.e. "{base}{language_directive}\n\nNo commentary, no
preamble, no headings, no markdown wrappers, no JSON — just the prose summary.")
so it matches the structured branch's spacing.
- Around line 513-519: Add coverage for the non-structured (plain-prose) branch
of system_prompt by invoking system_prompt with the structured flag set to false
(e.g., system_prompt(4096, false, Some("zh-CN"))) and assert the returned prompt
contains the language directive ("Simplified Chinese") and the plain-prose
instruction marker (e.g., "no json" or "no JSON") so both code paths that
interpolate language_directive are tested; update or add a test function
alongside system_prompt_includes_output_language_directive that performs these
assertions against the system_prompt function.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0de2937-26c4-4692-b3fe-ad79c15ae86e

📥 Commits

Reviewing files that changed from the base of the PR and between 204bca8 and f24ea46.

📒 Files selected for processing (14)

.env.example
src/openhuman/agent/harness/archivist.rs
src/openhuman/config/mod.rs
src/openhuman/config/schema/load.rs
src/openhuman/config/schema/load_tests.rs
src/openhuman/config/schema/types.rs
src/openhuman/learning/reflection.rs
src/openhuman/learning/reflection_tests.rs
src/openhuman/memory/tree/score/extract/llm.rs
src/openhuman/memory/tree/score/extract/llm_tests.rs
src/openhuman/memory/tree/score/extract/mod.rs
src/openhuman/memory/tree/score/mod.rs
src/openhuman/memory/tree/tree_source/summariser/llm.rs
src/openhuman/memory/tree/tree_source/summariser/mod.rs

✅ Files skipped from review due to trivial changes (2)

src/openhuman/memory/tree/tree_source/summariser/mod.rs
.env.example

sunilkumarvalmiki · 2026-05-21T18:28:36Z

Thanks @graycyrus, I addressed the locale-mapping follow-up in f24ea469 by adding explicit mappings for ja, de, zh-TW / zh-Hant, th, vi, and tr, plus test coverage for those tags.

All CI checks are green now, including Rust quality, Rust/core coverage, diff-cover, Windows secrets tests, Tauri build, Linux E2E, installer smoke, and PR quality checks.

Could you please take another look and merge if everything looks good?

M3gA-Mind

Continuation review — minor finding addressed ✓

The locale-mapping gap flagged in the previous pass (missing ja, de, zh-TW/zh-Hant, th, vi, tr) is fully resolved in f24ea46. The expanded mapping now covers all the obvious omissions, and the updated test in types.rs exercises each new tag. No new issues found.

All four background LLM surfaces (summariser, entity extractor, archivist recap, learning reflection) correctly plumb the directive, default to None so existing behavior is unchanged, and are covered by focused unit tests. Issue #2436 acceptance criteria met.

Good to merge when ready.

sunilkumarvalmiki requested a review from a team May 21, 2026 13:18

sunilkumarvalmiki mentioned this pull request May 21, 2026

Backend LLM prompts (memory_tree summariser/extractor, learning reflection) are hardcoded English — ignore locale & PROFILE #2436

Open

coderabbitai Bot previously approved these changes May 21, 2026

View reviewed changes

graycyrus reviewed May 21, 2026

View reviewed changes

Comment thread src/openhuman/config/schema/types.rs

fix(memory): localize background llm prompts

f24ea46

Signed-off-by: sunilkumarvalmiki <g.sunilkumarvalmiki@gmail.com>

sunilkumarvalmiki dismissed coderabbitai[bot]’s stale review via f24ea46 May 21, 2026 18:03

sunilkumarvalmiki force-pushed the codex/2436-background-llm-locale branch from 204bca8 to f24ea46 Compare May 21, 2026 18:03

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

coderabbitai Bot approved these changes May 21, 2026

View reviewed changes

M3gA-Mind self-assigned this May 22, 2026

M3gA-Mind reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): localize background LLM prompts#2447

fix(memory): localize background LLM prompts#2447
sunilkumarvalmiki wants to merge 1 commit into
tinyhumansai:mainfrom
sunilkumarvalmiki:codex/2436-background-llm-locale

sunilkumarvalmiki commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

graycyrus left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

sunilkumarvalmiki commented May 21, 2026

Uh oh!

M3gA-Mind left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sunilkumarvalmiki commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Validation

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sunilkumarvalmiki commented May 21, 2026

Uh oh!

M3gA-Mind left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sunilkumarvalmiki commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading