- Introduction
- Key Features
- Environment Setup Guide
- Configuration
- Usage Guide
- RAG Capabilities
- Architecture & Deep Dive
- Testing
This tool is designed to externalize your Perplexity.ai conversation history into organized, semantically searchable Markdown files. It facilitates the emergence of a personal knowledge base powered by local AI, bridging the gap between ephemeral inquiry and structured knowledge.
- Parallelized Extraction: Leverages Playwright to extract multiple conversation threads simultaneously for high-velocity data retrieval.
- Architectural Resilience: Automatically restores browser contexts and retries operations, ensuring continuity amidst environmental instability.
- Advanced RAG (Retrieval-Augmented Generation): Engage in a cognitive dialogue with your history. The system employs intent analysis to synthesize broad summaries or pinpoint specific technical insights.
- Semantic Vector Search: Move beyond keyword matching. Locate information based on conceptual depth and semantic relevance.
- Persistent State Tracking: Frequent checkpoints allow the system to resume progress after any interruption.
- Interactive Synthesis (REPL): A streamlined command-line interface for human-system synergy.
If you are new to development or don't have the necessary tools installed, follow these steps to set up your environment.
We recommend using a version manager to install Node.js. This allows you to easily switch versions and avoids permission issues.
- Windows:
- Download and run the latest installer from nvm-windows.
- Open a new Command Prompt or PowerShell and run:
nvm install 20 nvm use 20
- macOS / Linux:
- Install
nvmby following the instructions at nvm.sh. - Run:
nvm install 20 nvm use 20
- Install
Ollama is optional. It is only required if you want to use the Semantic Search or RAG (Retrieval-Augmented Generation) features. Basic extraction and keyword search work without it.
- Download and install Ollama from ollama.ai.
- Open your terminal and pull the required models:
ollama pull nomic-embed-text ollama pull deepseek-r1
If you don't have the git command installed, you can simply download this project as a ZIP file from GitHub and extract it.
Once extracted, open your terminal in the project folder and run:
npm install
npx playwright install chromiumEstablish your environment by duplicating the template:
cp .env.example .env- HEADLESS: Set to
falsein your.envfile. Note: Headless mode (true) is currently non-functional due to Cloudflare Turnstile protection on Perplexity.ai. Using headful mode allows you to complete any challenges manually if they appear. - OLLAMA_URL: Access point for your local AI engine (default: http://localhost:11434).
- OLLAMA_MODEL: Cognitive model for RAG synthesis (e.g., deepseek-r1).
- OLLAMA_EMBED_MODEL: Model for generating vector representations (e.g., nomic-embed-text).
- ENABLE_VECTOR_SEARCH: Set to
trueto activate semantic and RAG layers.
Launch the system:
# Start the development environment
npm run dev- Start scraper (Library): Initiates extraction. Authenticate manually if required.
- Note: Due to the complexity of Perplexity's API and potential network fluctuations, it may be necessary to run the scraper multiple times to ensure all conversations are fully gathered. The system uses checkpoints to resume where it left off.
- Search conversations: Interface with your history using various modes:
- Auto: Heuristic selection between semantic and exact search.
- Semantic: Fuzzy matching via high-dimensional vector space.
- RAG: Direct inquiry—e.g., "What did I learn about emergent intelligence?"
- Exact: Rapid string matching via ripgrep (bundled).
- Build vector index: Processes Markdown exports into a local vector store.
- Reset all data: Purges checkpoints, authentication data, and the vector index.
The RAG modality is engineered for various levels of cognitive inquiry:
- Broad Synthesis: "Summarize all threads regarding distributed systems."
- Granular Retrieval: "Locate the specific TypeScript pattern I used for the worker pool."
- Cross-Thread Integration: "How has my conceptual understanding of React hooks shifted?"
For a detailed look at our RAG implementation, hybrid search strategy, and theoretical foundations, please refer to:
👉 ARCH.md
- src/ai/: Ollama interaction and advanced RAG orchestration layers.
- src/scraper/: Playwright-based extraction logic and parallel worker pool management.
- src/search/: Vector storage (Vectra) and ripgrep search implementation.
- src/repl/: Interactive CLI components.
- src/utils/: Shared utility functions for data chunking and logging.
We prioritize a "Testing Trophy" architecture, emphasizing integration tests.
# Execute unit-level verifications
npm run test:unit
# Execute integration-level verifications
npm run test:integration