Add semantic search with FTS5 full-text indexing#33
Conversation
Enables BM25-ranked full-text search across all work items with porter
stemming, so "cache" matches "caching", "cached", etc. Features are
indexed with their title, description, tags, track context, and titles
of linked features via graph_edges for transitive discovery.
CLI: htmlgraph semantic {search,related,rebuild}
API: /api/semantic/search?q=..., /api/semantic/related?id=...
Auto-rebuilt during htmlgraph reindex.
https://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU
The search queries now resolve type/status/priority from both the features and tracks tables, and RebuildSemanticIndex indexes tracks alongside features, bugs, spikes, chores, epics, tasks, plans, and specs. https://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ff2cd98ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| replacer := strings.NewReplacer( | ||
| "(", " ", | ||
| ")", " ", | ||
| "*", " ", | ||
| "\"", " ", | ||
| ":", " ", | ||
| "^", " ", | ||
| "{", " ", | ||
| "}", " ", | ||
| ) |
There was a problem hiding this comment.
Strip FTS operators from hyphenated and quoted terms
sanitizeFTSQuery only removes a subset of FTS5 syntax characters, so inputs like in-progress or can't survive as in-progress*/can't* and are passed directly to MATCH; in SQLite FTS5 this is parsed as query syntax (not plain text), which raises runtime errors (e.g., no such column) and causes semantic search/related API calls to return 500 for common user queries. Expand sanitization (or escape terms) before appending *.
Useful? React with 👍 / 👎.
| if len(results) == 0 { | ||
| fmt.Println("No matching items found.") | ||
| fmt.Println("Tip: run 'htmlgraph semantic rebuild' to populate the index.") | ||
| return nil |
There was a problem hiding this comment.
Return JSON for empty semantic search with --json
The empty-result branch runs before the jsonOut check, so htmlgraph semantic search --json ... prints human-readable text instead of valid JSON when there are no matches; this breaks machine consumers that rely on --json always producing parseable JSON output.
Useful? React with 👍 / 👎.
…FTS5 queries Two review fixes: - P2: --json flag now returns [] instead of human-readable text when no results are found, ensuring machine consumers always get valid JSON. - P1: sanitizeFTSQuery now strips hyphens, apostrophes, and all other FTS5 syntax characters. Inputs like "in-progress" or "can't" previously caused "no such column" errors because FTS5 parsed them as operators. https://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU
Summary
Implements semantic search capabilities using SQLite's FTS5 (Full-Text Search 5) virtual table with BM25 ranking and Porter stemming. This enables users to search across features, tracks, and their relationships using natural language queries with fuzzy matching.
Key Changes
New semantic index module (
internal/db/semantic_repo.go):CreateSemanticIndex(): Creates FTS5 virtual table with Porter stemming tokenizerSemanticSearch(): BM25-ranked full-text search across indexed content with configurable column weights (title=10, description=5, content=2, tags=8, track_title=3, related_context=4)SemanticRelated(): Finds semantically similar features based on title and tagsRebuildSemanticIndex(): Rebuilds index from features and tracks tables, enriching with graph edge contextUpsertSemanticEntry()/DeleteSemanticEntry(): Index maintenance operationsCLI commands (
cmd/htmlgraph/semantic.go):htmlgraph semantic search <query>: Search with optional--limitand--jsonflagshtmlgraph semantic related <feature-id>: Find related featureshtmlgraph semantic rebuild: Rebuild the semantic indexAPI endpoints (
cmd/htmlgraph/api.go):GET /api/semantic/search?q=QUERY&limit=N: Full-text search endpointGET /api/semantic/related?id=FEATURE_ID&limit=N: Related features endpointIntegration:
internal/db/schema.go)htmlgraph reindexcommand (cmd/htmlgraph/reindex.go)cmd/htmlgraph/main.go)cmd/htmlgraph/serve.go)Notable Implementation Details
term*) for better recallhttps://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU