Skip to content

Add semantic search with FTS5 full-text indexing#33

Open
shakestzd wants to merge 3 commits intomainfrom
claude/semantic-feature-index-ysJ2S
Open

Add semantic search with FTS5 full-text indexing#33
shakestzd wants to merge 3 commits intomainfrom
claude/semantic-feature-index-ysJ2S

Conversation

@shakestzd
Copy link
Copy Markdown
Owner

Summary

Implements semantic search capabilities using SQLite's FTS5 (Full-Text Search 5) virtual table with BM25 ranking and Porter stemming. This enables users to search across features, tracks, and their relationships using natural language queries with fuzzy matching.

Key Changes

  • New semantic index module (internal/db/semantic_repo.go):

    • CreateSemanticIndex(): Creates FTS5 virtual table with Porter stemming tokenizer
    • SemanticSearch(): BM25-ranked full-text search across indexed content with configurable column weights (title=10, description=5, content=2, tags=8, track_title=3, related_context=4)
    • SemanticRelated(): Finds semantically similar features based on title and tags
    • RebuildSemanticIndex(): Rebuilds index from features and tracks tables, enriching with graph edge context
    • UpsertSemanticEntry() / DeleteSemanticEntry(): Index maintenance operations
    • Query sanitization to prevent FTS5 syntax errors from user input
  • CLI commands (cmd/htmlgraph/semantic.go):

    • htmlgraph semantic search <query>: Search with optional --limit and --json flags
    • htmlgraph semantic related <feature-id>: Find related features
    • htmlgraph semantic rebuild: Rebuild the semantic index
  • API endpoints (cmd/htmlgraph/api.go):

    • GET /api/semantic/search?q=QUERY&limit=N: Full-text search endpoint
    • GET /api/semantic/related?id=FEATURE_ID&limit=N: Related features endpoint
  • Integration:

    • Semantic index automatically created on database initialization (internal/db/schema.go)
    • Index rebuilt during htmlgraph reindex command (cmd/htmlgraph/reindex.go)
    • Semantic command added to root CLI (cmd/htmlgraph/main.go)
    • API endpoints registered in server (cmd/htmlgraph/serve.go)

Notable Implementation Details

  • Prefix matching: Query terms use wildcard suffix (term*) for better recall
  • Related context: Graph edges are bidirectionally indexed to capture feature relationships
  • Tag normalization: JSON array tags are extracted and space-separated for FTS5 indexing
  • Graceful degradation: Semantic index creation is non-fatal if FTS5 unavailable
  • BM25 ranking: Column-weighted relevance scoring with snippet generation for result preview

https://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU

claude added 2 commits April 7, 2026 23:18
Enables BM25-ranked full-text search across all work items with porter
stemming, so "cache" matches "caching", "cached", etc. Features are
indexed with their title, description, tags, track context, and titles
of linked features via graph_edges for transitive discovery.

CLI: htmlgraph semantic {search,related,rebuild}
API: /api/semantic/search?q=..., /api/semantic/related?id=...
Auto-rebuilt during htmlgraph reindex.

https://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU
The search queries now resolve type/status/priority from both the
features and tracks tables, and RebuildSemanticIndex indexes tracks
alongside features, bugs, spikes, chores, epics, tasks, plans, and specs.

https://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ff2cd98ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +379 to +388
replacer := strings.NewReplacer(
"(", " ",
")", " ",
"*", " ",
"\"", " ",
":", " ",
"^", " ",
"{", " ",
"}", " ",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip FTS operators from hyphenated and quoted terms

sanitizeFTSQuery only removes a subset of FTS5 syntax characters, so inputs like in-progress or can't survive as in-progress*/can't* and are passed directly to MATCH; in SQLite FTS5 this is parsed as query syntax (not plain text), which raises runtime errors (e.g., no such column) and causes semantic search/related API calls to return 500 for common user queries. Expand sanitization (or escape terms) before appending *.

Useful? React with 👍 / 👎.

Comment on lines +111 to +114
if len(results) == 0 {
fmt.Println("No matching items found.")
fmt.Println("Tip: run 'htmlgraph semantic rebuild' to populate the index.")
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return JSON for empty semantic search with --json

The empty-result branch runs before the jsonOut check, so htmlgraph semantic search --json ... prints human-readable text instead of valid JSON when there are no matches; this breaks machine consumers that rely on --json always producing parseable JSON output.

Useful? React with 👍 / 👎.

…FTS5 queries

Two review fixes:
- P2: --json flag now returns [] instead of human-readable text when
  no results are found, ensuring machine consumers always get valid JSON.
- P1: sanitizeFTSQuery now strips hyphens, apostrophes, and all other
  FTS5 syntax characters. Inputs like "in-progress" or "can't" previously
  caused "no such column" errors because FTS5 parsed them as operators.

https://claude.ai/code/session_017mUXrr6PYWDxR4yQgDwaEU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants