Feature: Backup classifications to JSONL to survive SQLite drops
Is your feature request related to a problem? Please describe.
Currently, the expensive LLM classification data (categories, primaryCategory, domains, primaryDomain) only lives in the local SQLite bookmarks.db. If the database is dropped or corrupted (e.g., via ft index --force or during a schema migration), all classification data is lost and the user needs to re-classify their bookmarks.
The Multi-Machine Sync Use Case
This also affects users who sync their ~/.ft-bookmarks/ folder across different computers using a private Git repository. Because bookmarks.db is a binary file (and is correctly .gitignore'd), pulling the latest bookmarks.jsonl to a second machine and running ft index results in a fully populated database, but without any of the classifications. This results in redundant LLM usage to categorize the same bookmarks on each machine.
Describe the solution you'd like
Implement a secondary "source of truth" file: ~/.ft-bookmarks/classifications.jsonl. This file will serve as an append-only ledger for all classification results.
- The "Save" Hook: Whenever
ft classify or ft classify-domains finishes a batch, it appends the results to classifications.jsonl.
- The "Self-Heal" Hook: Before
ft index rebuilds the database, it loads classifications.jsonl into memory. When inserting or updating a bookmark in SQLite, it merges the classification data in, hydrating the DB without needing new LLM calls.
- The "Export" Utility: Add
ft classify --export to scan the current database and generate the classifications.jsonl file for existing users.
Additional Context
I have implemented this architecture in Draft PR #137. It protects against data loss during database rebuilds and makes syncing a classified library across multiple machines much more efficient.
Feature: Backup classifications to JSONL to survive SQLite drops
Is your feature request related to a problem? Please describe.
Currently, the expensive LLM classification data (
categories,primaryCategory,domains,primaryDomain) only lives in the local SQLitebookmarks.db. If the database is dropped or corrupted (e.g., viaft index --forceor during a schema migration), all classification data is lost and the user needs to re-classify their bookmarks.The Multi-Machine Sync Use Case
This also affects users who sync their
~/.ft-bookmarks/folder across different computers using a private Git repository. Becausebookmarks.dbis a binary file (and is correctly.gitignore'd), pulling the latestbookmarks.jsonlto a second machine and runningft indexresults in a fully populated database, but without any of the classifications. This results in redundant LLM usage to categorize the same bookmarks on each machine.Describe the solution you'd like
Implement a secondary "source of truth" file:
~/.ft-bookmarks/classifications.jsonl. This file will serve as an append-only ledger for all classification results.ft classifyorft classify-domainsfinishes a batch, it appends the results toclassifications.jsonl.ft indexrebuilds the database, it loadsclassifications.jsonlinto memory. When inserting or updating a bookmark in SQLite, it merges the classification data in, hydrating the DB without needing new LLM calls.ft classify --exportto scan the current database and generate theclassifications.jsonlfile for existing users.Additional Context
I have implemented this architecture in Draft PR #137. It protects against data loss during database rebuilds and makes syncing a classified library across multiple machines much more efficient.