Skip to content

tylew/Tax-Code-Graph-RAG

Repository files navigation

Tax Graph RAG

This repository contains a GraphRAG working demonstration that ingests structured and unstructured tax sources, stores them in a vector database (Chroma) and a property graph (Neo4j), and exposes a chat API plus a small frontend to traverse the knowledge base and perform analytical queries.

TaxGPT-Lite demo gif

How to run

1. Prerequisites:

  1. Docker
  2. OpenAI API key
  3. Host ports 3000 and 8080 available

2. Set env variables

  1. cp .env.example .env
  2. Populate OPENAI_API_KEY with your OpenAI API key, alternatively see the environment variables section to configure Ollama

3. Cmd: make up

  • Initialization may take a second.. or two.

4. Open frontend http://localhost:3000

  • API may not fully start before frontend, it will soon be ready.

5. Follow instructions to ingest data

6. Use the 'Prompts' dropdown to browse example queries or create your own

Docs index

Repo heirarchy and linked documentation:

Environment variables

This project reads configuration from a .env file in the repo root. The canonical settings live in packages/src/shared/config/settings.py.

Required (choose one):

  • OpenAI: set OPENAI_API_KEY
  • Ollama: set OLLAMA_MODEL (default: llama3.1:8b)

Optional variables:

  • OPENAI_MODEL, EMBEDDING_MODEL
  • CHROMA_HOST, CHROMA_PORT, CHROMA_COLLECTION_NAME
  • NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD
  • LLAMA_CLOUD_API_KEY (only needed if re-running LlamaParse)

Development details

Stack

  • Backend: Python 3.11, FastAPI, LlamaIndex, ChromaDB, Neo4j
  • Frontend: Node 20, Vite, React

Quality Control

  • make tests
  • make lints

Api Chat Endpoint

Method: POST Path: /chat Request body: (application/json): { "message": "user query" } Response: 200 OK with Content-Type: text/event-stream Streaming format: Server-Sent Events (SSE).

Dev (Docker) commands

  • Run interface services in dev reload mode:
    • make dev-frontend
    • make api-dev (watches packages/ for reload)
  • Make databases available to host ports
  • Install packages directly to your current env
    • make install-pkgs or make install-pkgs-editable
  • Run package tests
    • make tests

To run the python env on your local environment

  1. Have python 3.11 (tested on 3.11.14)
  2. pip install packages
  3. Run make db-dev to expose the ports to host
  4. Set DB configuration variables within the python environment to host accessible addresses, ie:
    CHROMA_HOST=localhost
    CHROMA_PORT=8000
    NEO4J_URI=bolt://localhost:7687

Default Port mappings

    frontend: 3000
    api: 8080

Not exposed to host by default:

    neo4j: 7474, 7687
    chroma: 8000

to expose db to host, see Dev Commands

Graph image

About

A GraphRag implementation to ingest, retrieve, and traverse information from 2023 Internal Revenue Code, IRS Form 1040 instructions, and sample tax data.

Topics

Resources

Stars

Watchers

Forks

Contributors