feat(brainbar): aggregate entity aliases in graph#333
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
|
@macroscope review |
|
@codex review |
📝 WalkthroughWalkthroughKnowledge Graph entities now aggregate linked chunks across their alias groups. The database adds a ChangesAlias-group chunk aggregation and visualization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Actions performedReview triggered.
|
|
Just FYI for future @mentions, I'm Review in progress. Results will be posted as check runs when complete. |
ApprovabilityVerdict: Needs human review This PR introduces a new feature that changes how entity data is aggregated and displayed in the knowledge graph. Entity counts now aggregate across alias groups, and node sizes incorporate a new density metric. These runtime behavior changes warrant human review despite full code ownership and good test coverage. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2f9057755
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if canonicalMatches.count == 1 { | ||
| append(canonicalMatches[0]) |
There was a problem hiding this comment.
Avoid merging canonical rows on unique alias name collision
entityAliasGroupIDs promotes any entity whose name matches exactly one alias surface into that canonical group, even when the selected row is itself a canonical entity rather than an alias placeholder. In a dataset where entity A is a real canonical node and entity B has A.name as a unique alias, this branch appends B and all chunk/file/sidebar counts for A will be polluted with B’s evidence, which directly regresses retrieval correctness for graph and sidebar views.
Useful? React with 👍 / 👎.
| while sqlite3_step(stmt) == SQLITE_ROW { | ||
| let id = columnText(stmt, 0) ?? "" | ||
| rows.append(KGEntityRow( | ||
| id: columnText(stmt, 0) ?? "", | ||
| id: id, | ||
| name: columnText(stmt, 1) ?? "", | ||
| entityType: columnText(stmt, 2) ?? "", | ||
| description: columnText(stmt, 3), | ||
| importance: sqlite3_column_double(stmt, 4) | ||
| importance: sqlite3_column_double(stmt, 4), | ||
| linkedChunkCount: try fetchEntityChunkCount(entityId: id) |
There was a problem hiding this comment.
Remove per-row chunk count query from KG entity listing
fetchKGEntities now executes fetchEntityChunkCount inside the result loop, turning one entity query into an N+1 read pattern (and each count call does additional alias lookups). With the existing 500-entity graph load path, this adds hundreds to thousands of extra SQL statements per refresh cycle, which is a meaningful performance regression and can make graph reloads noticeably slower under real DB sizes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@brain-bar/Sources/BrainBar/BrainDatabase.swift`:
- Around line 3681-3689: fetchKGEntities currently calls fetchEntityChunkCount
for every KGEntityRow, causing an N+1 query problem (especially with limit=500);
replace the per-entity call with a single bulk query that returns counts grouped
by entity_id, map those counts to a dictionary keyed by id, and use that
dictionary when constructing KGEntityRow(linkedChunkCount: ...) so you only
execute one extra SQL statement. Locate fetchKGEntities and
fetchEntityChunkCount and implement a grouped COUNT(...) query (or add a new
helper like fetchChunkCountsForEntityIds(ids: [String])) that returns [String:
Int] which you then reference while building the rows for KGEntityRow.
- Around line 4017-4034: The current aliasEntitySQL can pull alias surfaces that
map to multiple canonical entities and thus leak ambiguous aliases into
canonical groups; modify the query (aliasEntitySQL) to restrict to alias
surfaces that are unique within the provided orderedIds set by grouping on
lower(a.alias) and adding a HAVING COUNT(DISTINCT a.entity_id) = 1 (or
equivalently join to a subquery that selects only alias values with a single
distinct entity_id for the given IN (...) list), then keep the existing join to
kg_entities aliasEntity and selection/order logic; update the prepared statement
usage (aliasStmt / bindEntityIDs / append) accordingly so only uniquely-mapped
aliasEntity.id rows are returned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e84448c7-d9c9-4c79-a1f8-874a672bb013
📒 Files selected for processing (4)
brain-bar/Sources/BrainBar/BrainDatabase.swiftbrain-bar/Sources/BrainBar/KnowledgeGraph/KGNode.swiftbrain-bar/Sources/BrainBar/KnowledgeGraph/KGViewModel.swiftbrain-bar/Tests/BrainBarTests/KnowledgeGraphTests.swift
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Macroscope - Correctness Check
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-03-18T00:12:08.774Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:08.774Z
Learning: In Swift files under brain-bar/Sources/BrainBar, enforce that when a critical dependency like the database is nil due to startup ordering (socket before DB), any tool handler that accesses the database must throw an explicit error (e.g., ToolError.noDatabase) instead of returning a default/empty value. Do not allow silent defaults (e.g., guard let db else { return ... }). Flag patterns that silently return defaults when db is nil, as this masks startup timing issues. This guidance applies broadly to similar Swift files in the BrainBar module, not just this one location.
Applied to files:
brain-bar/Sources/BrainBar/KnowledgeGraph/KGViewModel.swiftbrain-bar/Sources/BrainBar/KnowledgeGraph/KGNode.swiftbrain-bar/Sources/BrainBar/BrainDatabase.swift
📚 Learning: 2026-03-29T18:45:40.988Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 133
File: brain-bar/Sources/BrainBar/BrainDatabase.swift:0-0
Timestamp: 2026-03-29T18:45:40.988Z
Learning: In the BrainBar module’s Swift database layer (notably BrainDatabase.swift), ensure that the `search()` function’s `unreadOnly=true` path orders results by the delivery frontier cursor so the watermark `maxRowID` stays contiguous. Specifically, when `unreadOnly` is enabled, the query must include `ORDER BY c.rowid ASC` (e.g., via `let orderByClause = unreadOnly ? "c.rowid ASC" : "f.rank"`). Do not replace the unread-only ordering with relevance-based sorting (e.g., `f.rank`) unconditionally or for the unread-only path, as it can introduce gaps in the watermark and incorrectly mark unseen rows as delivered. Flag any future change to the `ORDER BY` clause in this function that makes relevance sorting apply to the unread-only case.
Applied to files:
brain-bar/Sources/BrainBar/BrainDatabase.swift
🔇 Additional comments (3)
brain-bar/Sources/BrainBar/KnowledgeGraph/KGNode.swift (1)
8-19: LGTM!Also applies to: 35-49
brain-bar/Sources/BrainBar/KnowledgeGraph/KGViewModel.swift (1)
182-183: LGTM!brain-bar/Tests/BrainBarTests/KnowledgeGraphTests.swift (1)
128-155: LGTM!Also applies to: 397-425, 427-462, 663-676, 778-790
| let aliasEntitySQL = """ | ||
| SELECT DISTINCT aliasEntity.id | ||
| FROM kg_entity_aliases a | ||
| JOIN kg_entities canonical ON canonical.id = a.entity_id | ||
| JOIN kg_entities aliasEntity ON lower(aliasEntity.name) = lower(a.alias) | ||
| WHERE a.entity_id IN (\(placeholders(count: orderedIds.count))) | ||
| AND aliasEntity.entity_type = canonical.entity_type | ||
| ORDER BY aliasEntity.id | ||
| """ | ||
| var aliasStmt: OpaquePointer? | ||
| guard sqlite3_prepare_v2(db, aliasEntitySQL, -1, &aliasStmt, nil) == SQLITE_OK else { | ||
| throw DBError.prepare(sqlite3_errcode(db)) | ||
| } | ||
| defer { sqlite3_finalize(aliasStmt) } | ||
| _ = bindEntityIDs(orderedIds, to: aliasStmt, startingAt: 1) | ||
| while sqlite3_step(aliasStmt) == SQLITE_ROW { | ||
| append(columnText(aliasStmt, 0) ?? "") | ||
| } |
There was a problem hiding this comment.
Ambiguous alias surfaces can still leak into canonical groups
Line 4021 matches alias entities by surface for every canonical ID in scope, but there’s no uniqueness guard on that surface. If the same alias maps to multiple canonical entities of the same type, those ambiguous alias rows can still be absorbed into canonical groups.
💡 Proposed SQL guard
let aliasEntitySQL = """
SELECT DISTINCT aliasEntity.id
FROM kg_entity_aliases a
JOIN kg_entities canonical ON canonical.id = a.entity_id
JOIN kg_entities aliasEntity ON lower(aliasEntity.name) = lower(a.alias)
WHERE a.entity_id IN (\(placeholders(count: orderedIds.count)))
AND aliasEntity.entity_type = canonical.entity_type
+ AND (
+ SELECT COUNT(DISTINCT a2.entity_id)
+ FROM kg_entity_aliases a2
+ JOIN kg_entities canonical2 ON canonical2.id = a2.entity_id
+ WHERE lower(a2.alias) = lower(a.alias)
+ AND canonical2.entity_type = canonical.entity_type
+ ) = 1
ORDER BY aliasEntity.id
"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let aliasEntitySQL = """ | |
| SELECT DISTINCT aliasEntity.id | |
| FROM kg_entity_aliases a | |
| JOIN kg_entities canonical ON canonical.id = a.entity_id | |
| JOIN kg_entities aliasEntity ON lower(aliasEntity.name) = lower(a.alias) | |
| WHERE a.entity_id IN (\(placeholders(count: orderedIds.count))) | |
| AND aliasEntity.entity_type = canonical.entity_type | |
| ORDER BY aliasEntity.id | |
| """ | |
| var aliasStmt: OpaquePointer? | |
| guard sqlite3_prepare_v2(db, aliasEntitySQL, -1, &aliasStmt, nil) == SQLITE_OK else { | |
| throw DBError.prepare(sqlite3_errcode(db)) | |
| } | |
| defer { sqlite3_finalize(aliasStmt) } | |
| _ = bindEntityIDs(orderedIds, to: aliasStmt, startingAt: 1) | |
| while sqlite3_step(aliasStmt) == SQLITE_ROW { | |
| append(columnText(aliasStmt, 0) ?? "") | |
| } | |
| let aliasEntitySQL = """ | |
| SELECT DISTINCT aliasEntity.id | |
| FROM kg_entity_aliases a | |
| JOIN kg_entities canonical ON canonical.id = a.entity_id | |
| JOIN kg_entities aliasEntity ON lower(aliasEntity.name) = lower(a.alias) | |
| WHERE a.entity_id IN (\(placeholders(count: orderedIds.count))) | |
| AND aliasEntity.entity_type = canonical.entity_type | |
| AND ( | |
| SELECT COUNT(DISTINCT a2.entity_id) | |
| FROM kg_entity_aliases a2 | |
| JOIN kg_entities canonical2 ON canonical2.id = a2.entity_id | |
| WHERE lower(a2.alias) = lower(a.alias) | |
| AND canonical2.entity_type = canonical.entity_type | |
| ) = 1 | |
| ORDER BY aliasEntity.id | |
| """ | |
| var aliasStmt: OpaquePointer? | |
| guard sqlite3_prepare_v2(db, aliasEntitySQL, -1, &aliasStmt, nil) == SQLITE_OK else { | |
| throw DBError.prepare(sqlite3_errcode(db)) | |
| } | |
| defer { sqlite3_finalize(aliasStmt) } | |
| _ = bindEntityIDs(orderedIds, to: aliasStmt, startingAt: 1) | |
| while sqlite3_step(aliasStmt) == SQLITE_ROW { | |
| append(columnText(aliasStmt, 0) ?? "") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@brain-bar/Sources/BrainBar/BrainDatabase.swift` around lines 4017 - 4034, The
current aliasEntitySQL can pull alias surfaces that map to multiple canonical
entities and thus leak ambiguous aliases into canonical groups; modify the query
(aliasEntitySQL) to restrict to alias surfaces that are unique within the
provided orderedIds set by grouping on lower(a.alias) and adding a HAVING
COUNT(DISTINCT a.entity_id) = 1 (or equivalently join to a subquery that selects
only alias values with a single distinct entity_id for the given IN (...) list),
then keep the existing join to kg_entities aliasEntity and selection/order
logic; update the prepared statement usage (aliasStmt / bindEntityIDs / append)
accordingly so only uniquely-mapped aliasEntity.id rows are returned.
|
@coderabbitai review |
|
@codex review |
|
@macroscope-app review |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4727818d3d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| WHERE a.entity_id IN (\(placeholders(count: orderedIds.count))) | ||
| AND aliasEntity.entity_type = canonical.entity_type | ||
| ORDER BY aliasEntity.id |
There was a problem hiding this comment.
Expand alias membership transitively before counting
entityAliasGroupIDs only runs the alias lookup once against the initial entityId, so aliases discovered in that pass are appended but never queried for their own aliases. In a chain like A→B and B→C, sidebar totals/pagination for A will include B’s chunks but silently miss C’s, which undercounts evidence and breaks the intended alias-group aggregation behavior in fetchEntityChunkCount, fetchEntitySourceFileCount, and page queries.
Useful? React with 👍 / 👎.
Summary
kg_entity_aliasesgroupsTest plan
swift test --package-path brain-bar --filter testFetchKGEntitiesAggregatesAliasGroupChunkCountswift test --package-path brain-bar --filter testFetchEntitySidebarCountsAndPagesAggregateAliasGroupswift test --package-path brain-bar --filter testAliasAggregationDoesNotMergeAmbiguousAliasSurfacesswift test --package-path brain-bar --filter testKGNodeRadiusScalesWithAliasAggregatedChunkCountswift test --package-path brain-bar --filter testKGNodeRadiusIgnoresNegativeLinkedChunkCountswift test --package-path brain-bar(470 tests)git diff --checkscripts/run_tests.sh(pre-push gate: 2186 passed, 9 skipped, 75 deselected, 1 xfailed; MCP 3 passed; isolated 36 passed; Bun/regression passed)Co-Authored-By: Codex codex@openai.com
Note
Medium Risk
Changes alias-group membership logic used for chunk counts, pagination, and graph node sizing; wrong grouping would mis-attribute evidence, though the new test covers the main ambiguity case.
Overview
Refines knowledge-graph alias aggregation in
BrainDatabase: graph entity loading no longer runs one chunk-count query per node, and alias resolution stops pulling in unrelated canonical entities when a shared name is ambiguous.fetchKGEntitiesnow collects entity rows first, then fillslinkedChunkCountvia a single batched path (fetchLinkedChunkCounts→ alias members +COUNT(DISTINCT chunk_id)over aVALUESgroup-members CTE) instead of callingfetchEntityChunkCountinside the fetch loop.entityAliasGroupIDsdrops the old step that merged another canonical entity when exactly one row matched the selected entity’s name and type throughkg_entity_aliases. Group membership is now the requested entity plus same-type entities whose name matches registered aliases for IDs already in the group—so a distinct person named “Alex” is not folded into “Alexander” just because “Alex” is Alexander’s alias.Adds
testCanonicalEntityDoesNotMergeIntoAnotherEntityUniqueAliasto lock that behavior for sidebar chunk counts and pagination.Reviewed by Cursor Bugbot for commit 4727818. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Aggregate entity aliases in knowledge graph chunk and source file queries
BrainDatabasenow resolve alias groups via a newentityAliasGroupIDsmethod and aggregate results across all alias members, rather than matching only the exact entity ID.fetchEntityChunkCount,fetchEntitySourceFileCount,fetchEntityChunksPage, andfetchEntitySourceFilesall expand to alias group IDs before binding SQL parameters.linkedChunkCountfield is added toKGEntityRowandKGNode; the node's visual radius now scales with this count in addition to importance.KGViewModelpassesrow.linkedChunkCountintoKGNodeso graph nodes reflect aggregated evidence counts.Macroscope summarized 4727818.
Summary by CodeRabbit
New Features
Tests