CREST-GV is a method which allows querying our described data collection of ~500 cell types (we keep piling more data to add to the data collection) to determine the enrichment score of a set of genetic variants. CREST-GV rely on peak properties for each cell type in the data collection leveraging LanceOtron peak caller. CREST-GV can also query your personal (in-house) data where formatted correctly (see In-house data format section for a properly formatted data structure), in this case, the user can rely on the peak caller of their own choice. The only mandatory input for the tool is the path of a genetic file, stored following the format shown in Genetic variant file format section.
CREST-GV can be uesed and run using the crestgv conda envrironment. Please follow the installation instruction detailed below.
This section rely on the assumption that any distribution of Conda (e.g., Anaconda, Miniconda, ...) or Mamba is already installed.
git clone git@github.com:Genome-Function-Initiative-Oxford/CREST-GV.git
cd CREST-GV
Activate the conda 'base' environment (if not active):
conda activate base
There are two ways to create the conda env crestgv environment:
- Using mamba (if
Mambais installed), and follow the on screen instructions:
mamba env create --file=envs/crestgv.yml
- Using conda, and follow the on screen instructions.
conda env create --file=envs/crestgv.yml
Now, the crestgv environment is created it needs to be activated:
conda activate crestgv
You can then use CREST-GV using this environment, enjoy!
CREST-GV has been successfully tested for the following operating systems: Ubuntu, CentOS, macOS (Intel CPU), and Windows. Unfortunately, it is not possible to install on macOS with M CPUs at the moment. For any error in the installation step, please open an issue so we can give a general solution for users.
If required for publication, package versions within the environment can be exported as follows:
conda activate crestgv
conda env export > crestgv_environment_versions.yml
import pandas as pd
import sys
sys.path.append('crestgv/')
from crestgv import crestgv
genetic = "<Genetic path>"
cvg = crestgv(genetic=genetic, output="<Name output directory>", collection_name="<Collection name>")
df_cvg = cvg.calculate_enrichment_score()
You can find some helpful parameter information using:
cgv = crestgv(genetic=genetic)
help(cgv)
Any genetic variant file provided to CREST-GV has to be tab (\t) separated and must contain at least 3 columns:
- "CHR_ID" : chromosome in the follwoing format chrV
- "CHR_POS" : chromosome position
- "SNPS" : ID of the variant (e.g., rs#####, or chrV-pos-ref-alt)
For example like:
CHR_ID CHR_POS SNPS
chr10 801748 rs60692108
chr10 823912 rs74876360
chr10 840700 chr10-840700-A-C
All the data has to be stored in a folder called with your collection name , following the tree-like format example below (see example_files/in-house-data folder example), where in-house-data is the collection name.
example_files/in-house-data/
├── bigwigs
│ ├── cell_type_1.bw
│ ├── cell_type_2.bw
│ └── cell_type_3.bw
├── in-house-data_info.csv
└── peaks
├── cell_type_1.bed
├── cell_type_2.bed
└── cell_type_3.bed
If any changes are made to CREST-GV, it is possible to update the repository by entering the main folder and pulling the update using:
# Enter the main folder
cd CREST-GV
# Pull updates
git pull
Alternatively, remove the cloned repository and then re-clone the repository as described above.
Warning: use rm carefully!
rm -rf CREST-GV
When using this repository, use the default terminal and do not load any module in the server (if logged-in).
If you have any suggestions, spot any errors, or have any questions regarding the pipelines, please do no hesitate to contact us anytime.