An adaptive GraphRAG system.
git clone git@github.com:KartikYZ/graphrag.git
# clone vllm and graphrag
git submodule init
git submodule updateinput/: contains 4 sample books for indexing of varying lengths:
| Input | Lines | Words | Total Runtime (s) w/ OpenAI | Actual |
|---|---|---|---|---|
| Book_small (default) | 3,972 | 32,457 | 70.26729 | A Christmas Carol |
| book_medium | 7,733 | 62,303 | 74.82933 | The Hound of the Baskervilles |
| book_large | 14,911 | 130,410 | 132.13395 | Pride and Prejudice |
| book_xlarge | 26,058 | 192,262 | 205.21476 | The Iliad |
ragtest/:
- Contains
input/<file>where<file>is input for indexing.
mv input/<file> ragtest/input/<file>prompts: default set of GraphRAG prompts (from News domain) generated via:
cd nimble
python -m graphrag.index --init --root ./ragtest- Install Micromamba
- Set Root prefix at
/mnt/ssd1/<user>/micromamba - Create environment
micromamba create -n graphrag python=3.12
Install GraphRAG
cd graphrag
micromamba install poetry
poetry installInstall vLLM
cd vLLM
export CUDA_HOME=/usr/local/cuda-12
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
pip install -e . Install Litellm
pip install 'litellm[proxy]'Symlink .cache at $HOME/.cache
mv ~/.cache /mnt/ssd1/$USER/.cache
ln -s /mnt/ssd1/$USER/.cache/ ~/.cache- Request approval for gated model:
meta-llama/Meta-Llama-3.1-8B-Instruct - Login to
huggingface-hub cli - Change
modelandapi_baseto local models inragtest/settings.yaml:
..
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat or openai_chat or static_response
model: meta-llama/Meta-Llama-3.1-8B-Instruct # gpt-4o-mini
model_supports_json: true # recommended if this is available for your model.
# max_tokens: 4000
# request_timeout: 180.0
api_base: http://0.0.0.0:4000 # https://<instance>.openai.azure.com
..
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
# target: required # or all
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: intfloat/e5-mistral-7b-instruct # text-embedding-3-small
api_base: http://localhost:8003/v1 # https://<instance>.openai.azure.com
..# start inference models
CUDA_VISIBLE_DEVICES=0 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --port 8000 --disable-log-requests
CUDA_VISIBLE_DEVICES=1 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --port 8001 --disable-log-requests
CUDA_VISIBLE_DEVICES=2 vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --port 8002 --disable-log-requests
# start litellm proxy
litellm --config litellm_config.yaml
# start embedding model
CUDA_VISIBLE_DEVICES=3 vllm serve intfloat/e5-mistral-7b-instruct --port 8003 --disable-log-requests
# index
cd nimble
python -m graphrag.index --memprofile --reporter rich --verbose --root ./ragtest
# query
# local
python -m graphrag.query --root ./ragtest --method local "What did Sherlock see one night in the Moor?"
# global
python -m graphrag.query --root ./ragtest --method global "What are the top themes in this story?"Update settings.yaml to specify pruning strategy in top-down CR generation with degree-based pruning:
..
community_reports:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 1000 #8000; reduced to 1 to debug as default prompt contexts are all within limit and don't require pruning
local_context_pruning_strategy: "degree" # or "top_k" or "threshold"; degree keeps highest degree vertices
..