Skip to content

KartikYZ/cedar

Repository files navigation

Cedar

An adaptive GraphRAG system.

Setup repositories:

git clone git@github.com:KartikYZ/graphrag.git

# clone vllm and graphrag
git submodule init
git submodule update

input/: contains 4 sample books for indexing of varying lengths:

Input Lines Words Total Runtime (s) w/ OpenAI Actual
Book_small (default) 3,972 32,457 70.26729 A Christmas Carol
book_medium 7,733 62,303 74.82933 The Hound of the Baskervilles
book_large 14,911 130,410 132.13395 Pride and Prejudice
book_xlarge 26,058 192,262 205.21476 The Iliad

ragtest/:

  • Contains input/<file> where <file> is input for indexing.
mv input/<file> ragtest/input/<file>
  • prompts: default set of GraphRAG prompts (from News domain) generated via:
cd nimble
python -m graphrag.index --init --root ./ragtest

Environment

  1. Install Micromamba
  2. Set Root prefix at /mnt/ssd1/<user>/micromamba
  3. Create environment micromamba create -n graphrag python=3.12

Install GraphRAG

cd graphrag
micromamba install poetry
poetry install

Install vLLM

cd vLLM
export CUDA_HOME=/usr/local/cuda-12
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
pip install -e . 

Install Litellm

pip install 'litellm[proxy]'

Symlink .cache at $HOME/.cache

mv ~/.cache /mnt/ssd1/$USER/.cache
ln -s /mnt/ssd1/$USER/.cache/ ~/.cache

Replacing OpenAI with Local Llama-3.1-8B-Instruct model

  1. Request approval for gated model: meta-llama/Meta-Llama-3.1-8B-Instruct
  2. Login to huggingface-hub cli
  3. Change model and api_base to local models in ragtest/settings.yaml:
..

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat or openai_chat or static_response
  model: meta-llama/Meta-Llama-3.1-8B-Instruct # gpt-4o-mini
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://0.0.0.0:4000 # https://<instance>.openai.azure.com

..

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: intfloat/e5-mistral-7b-instruct # text-embedding-3-small
    api_base: http://localhost:8003/v1 # https://<instance>.openai.azure.com

..

Run GraphRAG with local models

# start inference models
CUDA_VISIBLE_DEVICES=0 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --port 8000 --disable-log-requests
CUDA_VISIBLE_DEVICES=1 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --port 8001 --disable-log-requests
CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --port 8002 --disable-log-requests

# start litellm proxy
litellm --config litellm_config.yaml

# start embedding model
CUDA_VISIBLE_DEVICES=3 vllm serve intfloat/e5-mistral-7b-instruct --port 8003 --disable-log-requests

# index
cd nimble
python -m graphrag.index --memprofile --reporter rich --verbose --root ./ragtest

# query 

# local
python -m graphrag.query --root ./ragtest --method local "What did Sherlock see one night in the Moor?" 

# global
python -m graphrag.query --root ./ragtest --method global "What are the top themes in this story?"

Update settings.yaml to specify pruning strategy in top-down CR generation with degree-based pruning:

..
community_reports:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 1000 #8000; reduced to 1 to debug as default prompt contexts are all within limit and don't require pruning
  local_context_pruning_strategy: "degree" # or "top_k" or "threshold"; degree keeps highest degree vertices
..

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors