GitHub - simwai/perplexity-ai-export: Grabs all your Perplexity conversations data, spits it out into a nice file folder structure and allows you to find your old conversations again.

Introduction
Key Features
Environment Setup Guide
Configuration
- Key Environment Variables
Usage Guide
- Operational Directives
RAG Capabilities
Architecture & Deep Dive
- Project Structure
Testing

Introduction

This tool is designed to externalize your Perplexity.ai conversation history into organized, semantically searchable Markdown files. It facilitates the emergence of a personal knowledge base powered by local AI, bridging the gap between ephemeral inquiry and structured knowledge.

Key Features

Parallelized Extraction: Leverages Playwright to extract multiple conversation threads simultaneously for high-velocity data retrieval.
Architectural Resilience: Automatically restores browser contexts and retries operations, ensuring continuity amidst environmental instability.
Advanced RAG (Retrieval-Augmented Generation): Engage in a cognitive dialogue with your history. The system employs intent analysis to synthesize broad summaries or pinpoint specific technical insights.
Semantic Vector Search: Move beyond keyword matching. Locate information based on conceptual depth and semantic relevance.
Persistent State Tracking: Frequent checkpoints allow the system to resume progress after any interruption.
Interactive Synthesis (REPL): A streamlined command-line interface for human-system synergy.

Environment Setup Guide

If you are new to development or don't have the necessary tools installed, follow these steps to set up your environment.

1. Install Node.js (The Engine)

We recommend using a version manager to install Node.js. This allows you to easily switch versions and avoids permission issues.

Windows:
1. Download and run the latest installer from nvm-windows.
2. Open a new Command Prompt or PowerShell and run:
```
nvm install 20
nvm use 20
```
macOS / Linux:
1. Install nvm by following the instructions at nvm.sh.
2. Run:
```
nvm install 20
nvm use 20
```

2. Install Ollama (Optional - For AI Intelligence)

Ollama is optional. It is only required if you want to use the Semantic Search or RAG (Retrieval-Augmented Generation) features. Basic extraction and keyword search work without it.

Download and install Ollama from ollama.ai.

Open your terminal and pull the required models:

ollama pull nomic-embed-text
ollama pull deepseek-r1

3. Download and Prepare the Project

If you don't have the git command installed, you can simply download this project as a ZIP file from GitHub and extract it.

Once extracted, open your terminal in the project folder and run:

npm install
npx playwright install chromium

Configuration

Establish your environment by duplicating the template:

cp .env.example .env

Key Environment Variables

HEADLESS: Set to false in your .env file. Note: Headless mode (true) is currently non-functional due to Cloudflare Turnstile protection on Perplexity.ai. Using headful mode allows you to complete any challenges manually if they appear.
OLLAMA_URL: Access point for your local AI engine (default: http://localhost:11434).
OLLAMA_MODEL: Cognitive model for RAG synthesis (e.g., deepseek-r1).
OLLAMA_EMBED_MODEL: Model for generating vector representations (e.g., nomic-embed-text).
ENABLE_VECTOR_SEARCH: Set to true to activate semantic and RAG layers.

Usage Guide

Launch the system:

# Start the development environment
npm run dev

Operational Directives

Start scraper (Library): Initiates extraction. Authenticate manually if required.
- Note: Due to the complexity of Perplexity's API and potential network fluctuations, it may be necessary to run the scraper multiple times to ensure all conversations are fully gathered. The system uses checkpoints to resume where it left off.
Search conversations: Interface with your history using various modes:
- Auto: Heuristic selection between semantic and exact search.
- Semantic: Fuzzy matching via high-dimensional vector space.
- RAG: Direct inquiry—e.g., "What did I learn about emergent intelligence?"
- Exact: Rapid string matching via ripgrep (bundled).
Build vector index: Processes Markdown exports into a local vector store.
Reset all data: Purges checkpoints, authentication data, and the vector index.

RAG Capabilities

The RAG modality is engineered for various levels of cognitive inquiry:

Broad Synthesis: "Summarize all threads regarding distributed systems."
Granular Retrieval: "Locate the specific TypeScript pattern I used for the worker pool."
Cross-Thread Integration: "How has my conceptual understanding of React hooks shifted?"

Architecture & Deep Dive

For a detailed look at our RAG implementation, hybrid search strategy, and theoretical foundations, please refer to:

👉 ARCH.md

Project Structure

src/ai/: Ollama interaction and advanced RAG orchestration layers.
src/scraper/: Playwright-based extraction logic and parallel worker pool management.
src/search/: Vector storage (Vectra) and ripgrep search implementation.
src/repl/: Interactive CLI components.
src/utils/: Shared utility functions for data chunking and logging.

Testing

We prioritize a "Testing Trophy" architecture, emphasizing integration tests.

# Execute unit-level verifications
npm run test:unit

# Execute integration-level verifications
npm run test:integration

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.husky		.husky
.vscode		.vscode
docs		docs
scripts		scripts
src		src
test		test
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.oxfmtrc.json		.oxfmtrc.json
.oxlintrc.json		.oxlintrc.json
.release-it.json		.release-it.json
ARCH.md		ARCH.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
commitlint.config.js		commitlint.config.js
package-lock.json		package-lock.json
package.json		package.json
sea-config.json		sea-config.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Key Features

Environment Setup Guide

1. Install Node.js (The Engine)

2. Install Ollama (Optional - For AI Intelligence)

3. Download and Prepare the Project

Configuration

Key Environment Variables

Usage Guide

Operational Directives

RAG Capabilities

Architecture & Deep Dive

Project Structure

Testing

About

Uh oh!

Releases 1

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Key Features

Environment Setup Guide

1. Install Node.js (The Engine)

2. Install Ollama (Optional - For AI Intelligence)

3. Download and Prepare the Project

Configuration

Key Environment Variables

Usage Guide

Operational Directives

RAG Capabilities

Architecture & Deep Dive

Project Structure

Testing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Contributors

Uh oh!

Languages