Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .lore.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,27 @@
<!-- lore:019e4425-f7b6-7f33-99dc-2e17087f251e -->
* **Always ask for session history recall at the start of new sessions**: Session continuity: At session start, recall specific prior-session details (PR titles, branch names, test counts, file names, commit details, lint fixes) as precisely as possible. State explicitly when details were not captured. Proactively log granular session artifacts (exact error messages, PR titles, file paths, line numbers, commit hashes, test names, root causes) so future recall is reliable. User relies on this to resume context quickly. When information is unavailable, say so explicitly and suggest where to find it (GitHub, logs, commit history). Never fabricate or infer details as confirmed facts — always distinguish verified session records from generated/uncertain information.

<!-- lore:019e4478-81dd-77bc-a3e7-931069606b23 -->
* **Always carry forward workflow directives and incident context as persistent hard rules across sessions**: 🔴 HARD RULES (every session): 1. Two-phase Plan/Build: write plan file first, wait for approval, then implement. 2. Pre-merge self-review: rate all changed files by severity with exact paths/line numbers; fix all criticals; confirm CI green. 3. Eval fairness: demand deep fairness analysis before accepting eval results. 4. LTM cache delete must be inside 'changes made' guard — unconditional placement is a bug. 5. Re-state hard rules + completed work at session start with exact file paths, line numbers, commit hashes, PR URLs — flag missing details with 🟡. 6. Root cause first: RCA must precede any proposed solution. 7. Never run \`git remote -v\` with client-controlled cwd. 8. IDs starting with \`LOREAI-GATEWAY-\` are Sentry issue IDs. 9. Pull from origin/main before any codebase exploration or edits. 10. Resolve \`.lore.md\` rebase conflicts with \`--ours\`. 11. When multiple fixes identified, apply together and immediately re-run evals. 12. Before implementing features/debugging, explore relevant codebase area first. 13. Production incidents: acknowledge → investigate → hotfix → code fix → preventive measures. 14. 'Let's go with that plan.' = full approval, proceed immediately. 15. Pre-c \[truncated — entry too long]

<!-- lore:019e44c8-e3b2-70c1-afb6-d3acf24c531a -->
* **Always fix cache memory leaks with TTL eviction, size cap, and scheduled pruning**: Cache memory leak fix pattern: (1) TTL check in \`.get()\` — delete and return undefined if \`Date.now() - entry.ts > this.ttlMs\`; (2) LRU eviction in \`.set()\` — delete oldest key when \`store.size >= maxEntries\`; (3) \`setInterval(() => this.prune(), 60\_000)\` in constructor. Defaults: \`maxEntries = 10\_000\`, \`ttlMs = 300\_000\` (5 min). Applies to all cache modules.

<!-- lore:019e44c0-d663-758a-849c-95a3daf9bead -->
* **Always include database migration versioning context when discussing schema changes**: When discussing schema changes, always preserve migration versioning context: which version introduced the change, which reverted it, the incident that caused the revert, and the rollback migration filename. Reference past incidents when proposing new schema changes to avoid repeating mistakes.

<!-- lore:019e4248-3b73-75da-9fb4-099bed46f632 -->
* **Always investigate root causes thoroughly before approving implementation changes**: Always investigate root causes thoroughly before proposing changes: explain why the original design decision was made, downstream effects, and why removing it is safe. Read actual source files, trace full execution paths, report precise file names, line numbers, and exact code snippets verbatim. Pull from origin/main before any codebase exploration or edits. When using git worktrees: stash uncommitted changes, rebase onto origin/main, then restore. Always resolve \`.lore.md\` rebase conflicts by keeping main's version (\`--ours\`). Before considering a PR done, perform a critical self-review: code correctness, test quality, PR description accuracy, accidentally staged files, stale comments, debug artifacts. Fix all issues (labeled C1, C2, etc.), amend commit, force-push. End sessions with a structured summary: (1) all PRs shipped with numbers and one-line descriptions; (2) current eval scores showing Lore vs baseline with delta percentages. When multiple fixes are identified, apply them together and immediately re-run evals — treat eval runs as the primary feedback loop, analyze results per-category. IDs starting with \`LOREAI-GATEWAY-\` are Sentry issue IDs.

<!-- lore:019e44be-95b2-7d7e-b53f-69afe99610f7 -->
* **Always reject proper-lockfile in favor of flock for advisory locking**: Always use \`flock\` advisory locking instead of \`proper-lockfile\`. \`proper-lockfile@4.1.2\` only checks if a PID is alive on the same machine — fails in containerized environments where PID namespaces reset on restart, leaving stale locks. \`flock\` is automatically released on process exit. Never suggest \`proper-lockfile\` in this codebase.

<!-- lore:019e4422-5b29-77a8-8956-488233ef16a4 -->
* **Always request critical code reviews with specific file paths, line numbers, and severity classifications**: Code review standard: provide exact file paths, line ranges, severity classifications (critical/medium/low), root causes, and concrete fix recommendations. Must-fix items called out explicitly before merge. Before merging any PR: (1) run critical self-review covering all changed files; (2) fix all criticals; (3) confirm CI green. Reviews must be skeptical — actively look for subtle bugs (state not cleared on fallback paths, consume-once flag semantics, circuit breaker bypass, concurrency edge cases). Produce explicit verdict alongside ranked findings. Before implementing features or debugging, read all named files deeply and report findings with precise references. Always analyze root causes before proposing solutions. When starting eval-related work, enumerate concrete gaps before proposing solutions. Track which evals have been run vs. pending. After root-cause analysis or bug fix, propose eval extensions covering the newly discovered failure mode. When presented with a GitHub issue, challenge unsubstantiated claims — verify against actual code.

<!-- lore:019e44c8-4e3f-7835-972f-02ed2033a842 -->
* **Always request worker tests with a consistent 7-case spec covering compute, missing-record, cleanup retention, and sync scenarios**: Worker test files follow a consistent 7-case spec: (1) compute job — DB lookup + update, (2) missing record — skip without throw, (3) cleanup — hard-delete records archived >30 days, (4) cleanup — preserve recently archived records, (5) sync — process a batch, (6) sync — skip missing records, (7) sync — respect dryRun flag. Tests mock DB and Redis. Applies uniformly across all worker modules.

<!-- lore:019e3cd7-97d3-7053-8f02-bb13d727662e -->
* **Lore eval scores must beat or match tail-window — scoring below it means lost information**: Lore eval scores must beat or match tail-window baseline — scoring below means lost information (treat as bug). \`inflateScenario(scenario, opts?)\` in \`packages/eval/src/inflate.ts\` — opts is \`{ targetTokens?, excludeKeywords? }\`, NOT positional args; silently fails. Token estimation: chars/4 (scenario convention; chars/3 in baselines.ts for budget safety). Auto-extracts protected keywords from question+referenceAnswer. Adjusts \`question.metadata.turnIndex\` after inflation. 8 replay fixtures, 16 scenarios, 130 questions, 6 baselines in CI. \`--inflate\` incompatible with replay mode — run inflated scenarios in live mode only. Inflator buries preference-change turns (known issue).

Expand Down
14 changes: 10 additions & 4 deletions packages/core/src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -80,16 +80,22 @@ export const LoreConfig = z.object({
metaThreshold: z.number().min(3).default(20),
/** Max chars per tool output when rendering temporal messages for distillation input.
* Outputs longer than this are replaced with a compact annotation preserving line
* count, error signals, and file paths. Default: 2000 (matches upstream OpenCode's
* TOOL_OUTPUT_MAX_CHARS during compaction). Set to 0 to disable. */
toolOutputMaxChars: z.number().min(0).default(2_000),
* count, error signals, and file paths. Default: 4000. Raised from 2000 to preserve
* error messages and stack traces that exceed 2K chars. See #417. Set to 0 to disable. */
toolOutputMaxChars: z.number().min(0).default(4_000),
/** Number of most-recent gen-0 segments to keep un-archived when
* meta-distillation fires. These segments retain full detail in the
* context prefix while older segments are consolidated. Default: 5.
* See #417. */
recentSegmentsToKeep: z.number().min(0).default(5),
})
.default({
minMessages: 5,
minSegmentTokens: 64,
maxSegmentTokens: 16384,
metaThreshold: 20,
toolOutputMaxChars: 2_000,
toolOutputMaxChars: 4_000,
recentSegmentsToKeep: 5,
}),
knowledge: z
.object({
Expand Down
43 changes: 29 additions & 14 deletions packages/core/src/distillation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -101,21 +101,23 @@ export function workerTokenBudget(
/**
* Compute the max_tokens budget for gen-0 distillation of raw messages.
*
* Uses a √N-based formula (8 × √N) instead of a linear ratio so that the
* Uses a √N-based formula (10 × √N) instead of a linear ratio so that the
* budget grows sub-linearly with input size. This naturally constrains the
* LLM to produce output at ~R ≈ 2–4 (the square-root boundary) and avoids
* expansion on small segments where a linear 0.25 ratio + 1024 floor gave
* the model far too much room.
*
* The multiplier (8) gives ~4× headroom above the R=2.0 target, accounting
* The multiplier (10) gives ~5× headroom above the R=2.0 target, accounting
* for the detailed observation format (emoji markers, timestamps, entity
* tags, exact numbers) required by the distillation prompt.
* tags, exact numbers) required by the distillation prompt. Raised from 8
* to improve retention of specific identifiers (error messages, file paths,
* version numbers) in long sessions (400K+ tokens). See #417.
*
* @param sourceTokens Estimated source token count from raw messages
* @returns Token budget clamped to [256, 4096]
*/
export function distillTokenBudget(sourceTokens: number): number {
const MULTIPLIER = 8;
const MULTIPLIER = 10;
const FLOOR = 256;
const CAP = 4096;
return Math.max(FLOOR, Math.min(Math.ceil(MULTIPLIER * Math.sqrt(sourceTokens)), CAP));
Expand Down Expand Up @@ -1084,19 +1086,31 @@ async function metaDistillInner(input: {
// since the last meta-distill — no overlap with the anchor body.
const priorMeta = latestMeta(input.projectPath, input.sessionID);

// Partition: keep the most recent N gen-0 segments un-archived so their
// full detail stays in the context prefix. Only consolidate older segments.
// This prevents the two-stage compression from dropping specific identifiers
// (error messages, file paths, version numbers) in long sessions. See #417.
const cfg = config();
const recentToKeep = cfg.distillation.recentSegmentsToKeep;
const toConsolidate = recentToKeep > 0 && existing.length >= recentToKeep
? existing.slice(0, -recentToKeep)
: existing;

// Threshold: first meta needs ≥3 gen-0 segments to consolidate. Subsequent
// anchored metas only need ≥1 new gen-0 since the prior meta already covers
// earlier history; without this distinction, every meta-distill round would
// need a fresh pile of segments and we'd lose the incremental-update benefit.
// Apply threshold to toConsolidate (not existing) to prevent the kept recent
// segments from re-triggering consolidation on the next idle tick.
if (priorMeta) {
if (existing.length === 0) return null;
if (toConsolidate.length === 0) return null;
} else {
if (existing.length < 3) return null;
if (toConsolidate.length < 3) return null;
}

const userContent = recursiveUser(existing, priorMeta?.observations);
const userContent = recursiveUser(toConsolidate, priorMeta?.observations);

const model = input.model ?? config().model;
const model = input.model ?? cfg.model;
const inputTokens = Math.ceil(userContent.length / 3);
const maxTokens = workerTokenBudget(inputTokens, 0.25, 1024, 8192);
const responseText = await input.llm.prompt(
Expand All @@ -1117,10 +1131,10 @@ async function metaDistillInner(input: {
// covers new gen-0 since the last meta — we must consult the prior meta's
// generation explicitly to keep the chain monotonic.
const maxGen = Math.max(
...existing.map((d) => d.generation),
...toConsolidate.map((d) => d.generation),
priorMeta?.generation ?? 0,
);
const allSourceIDs = existing.flatMap((d) => d.source_ids);
const allSourceIDs = toConsolidate.flatMap((d) => d.source_ids);

// Atomic: store the new meta row + archive the merged gen-0 rows in one
// transaction. Without this, a crash between the two would leave stale
Expand All @@ -1139,10 +1153,11 @@ async function metaDistillInner(input: {
generation: maxGen + 1,
callType: input.callType,
});
// Archive the gen-0 distillations that were merged into gen-1+.
// They remain searchable via BM25 recall but are excluded from the
// in-context prefix and (post-F2) from `loadForSession`'s default path.
archiveDistillations(existing.map((d) => d.id));
// Archive only the consolidated gen-0 distillations — recent segments
// kept via recentSegmentsToKeep remain non-archived in the prefix.
// Archived rows remain searchable via BM25 recall but are excluded from
// the in-context prefix and (post-F2) from `loadForSession`'s default path.
archiveDistillations(toConsolidate.map((d) => d.id));
db().exec("COMMIT");
} catch (e) {
db().exec("ROLLBACK");
Expand Down
43 changes: 32 additions & 11 deletions packages/core/src/gradient.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1229,23 +1229,44 @@ function importanceBonus(d: Distillation): number {
return Math.min(bonus, 1.0);
}

function selectDistillations(all: Distillation[], limit: number): Distillation[] {
export function selectDistillations(all: Distillation[], limit: number): Distillation[] {
if (all.length <= limit) return all;

// Recency: normalize to [0, 0.7] where oldest = 0.0, newest = 0.7.
// Use (length - 1) as divisor so the last entry gets full recency weight.
const maxIdx = all.length - 1;
const scored = all.map((d, i) => ({
// Always include meta distillations (gen >= 1) — they contain the
// consolidated session history and must not be evicted by recency-weighted
// gen-0 segments. Without this guarantee, layer 3 (distLimit=5) would drop
// the meta in favor of 5 recent gen-0 segments, losing older context. #417.
const meta = all.filter((d) => d.generation >= 1);
const gen0 = all.filter((d) => d.generation === 0);
const remainingSlots = limit - meta.length;

// If meta entries alone fill or exceed the limit, keep them all by score.
if (remainingSlots <= 0) {
const maxIdx = meta.length - 1;
const scored = meta.map((d, i) => ({
d,
score: (maxIdx > 0 ? (i / maxIdx) : 1) * 0.7 + importanceBonus(d) * 0.3,
}));
return scored
.sort((a, b) => b.score - a.score)
.slice(0, limit)
.map((s) => s.d)
.sort((a, b) => a.created_at - b.created_at);
}

// Fill remaining slots from gen-0 by recency + importance scoring.
const maxIdx = gen0.length - 1;
const scored = gen0.map((d, i) => ({
d,
score: (maxIdx > 0 ? (i / maxIdx) : 1) * 0.7 + importanceBonus(d) * 0.3,
}));

// Keep top N by score, then re-sort chronologically (cache-safe).
return scored
const topGen0 = scored
.sort((a, b) => b.score - a.score)
.slice(0, limit)
.map((s) => s.d)
.sort((a, b) => a.created_at - b.created_at);
.slice(0, remainingSlots)
.map((s) => s.d);

// Merge and re-sort chronologically (cache-safe).
return [...meta, ...topGen0].sort((a, b) => a.created_at - b.created_at);
}

// Build a synthetic message pair containing the distilled history.
Expand Down
Loading
Loading