Orobas is an R package (with Python modules) for scoring chemical-genetic CRISPR screening data.
Orobas is a computational approach for transforming raw data from CRISPR-Cas9 chemical-genetic screens into quantitative interaction scores. Orobas computes differential interaction scores with accompanying statistical tests that account for multiple CRISPR guides per gene. The method includes approaches for post-processing differential log2 fold-change scores across multiple screens, incorporating normalization to reduce technical artifacts and correct batch effects.
We recommend installing Anaconda or any other virtual environment manager of your choice.
Download the Orobas source code:
git clone https://github.com/csbio/orobas.git
cd orobasCreate a virtual environment using the provided YAML configuration:
conda env create -f orobas_environment.ymlBy default, this will create an environment named:
orobas_env
You can change the environment name by editing the name: field in orobas_environment.yml.
Activate the environment before running Orobas modules:
conda activate orobas_envNote:
- Recommended versions:
- Python >= 3.9
- R >= 3.6
- The code in this protocol was executed using:
- Python 3.9
- R 4.4.3
After activating the environment, you can run Orobas as described in the example code and the protocol.
Output directory and files from single-screen scoring:
.
├── <output>
│ ├── <screen-batch-1> # (Directory)
│ │ ├── ...
│ ├── <screen-batch-2> # (Directory)
│ │ ├── qc # (Directory) screen replicate LFC scatter plots and other quality control files
│ │ │ ├── essential_PR_QC.tsv # precision-recall AUC of essential-targeting guides from screen replicates
│ │ │ ├── lfc_heatmap.png # heatmap of Pearson Correlation among screen replicate LFCs
│ │ │ ├── replicate_cor.tsv # Pearson Correlation among screen replicate LFCs
│ │ │ ├── <screen-batch-2>_<condition-screen-1-replicate-A>_vs_<screen-batch-2>_<condition-screen-1-replicate-B>_replicate_comparison.png
│ │ │ ├── ...
│ │ │ ├── <screen-batch-2>_<control-screen-1-replicate-A>_vs_<screen-batch-2>_<control-screen-1-replicate-B>_replicate_comparison.png
│ │ │ ├── ...
│ │ │ ├── reads # (Directory) raw read count histograms of screen replicates and other quality control files
│ │ │ │ ├── total_reads.png # bar-plot of raw read counts from all screen replicates
│ │ │ │ ├── reads_heatmap.png # heatmap of Pearson Correlation among screen replicate raw read counts
│ │ │ │ ├── <screen-batch-2_T0>_raw_reads_histogram.png
│ │ │ │ ├── ...
│ │ │ │ ├── <screen-batch-2>_<control-screen-1-replicate-A>_raw_reads_histogram.png
│ │ │ │ ├── ...
│ │ │ │ ├── <screen-batch-2>_<condition-screen-1-replicate-A>_raw_reads_histogram.png
│ │ │ │ ├── ...
│ │ ├── guide_dlfc # (Directory) guide-level replicate-level dLFC score file
│ │ │ ├── <screen-batch-2>_<condition-screen-1>_vs_<screen-batch-2>_<control-screen-1>_guide_dlfc_pre_jk.tsv
│ │ │ ├── ...
│ │ ├── plots # (Directory) scatter plots of gene-level condition LFCs vs control LFCs with negative and positive interactions
│ │ │ ├── <screen-batch-2>_<condition-screen-1>_vs_<screen-batch-2>_<control-screen-1>_scatter.png
│ │ │ ├── ...
│ │ ├── condition_gene_calls.tsv # ***score file containing gene-level screen-level LFC, dLFC, FDR, significant hits and other values
│ │ ├── t0_normalized_screens_guide_level.tsv # guide-level replicate-level LFC score file
│ ├── <screen-batch-3> # (Directory)
│ │ ├── ...
│ ├── ...
│ ├── differential_LFC_scores.tsv # gene-level dLFC scores from all screens from all screen-batches
│ ├── fdr_scores.tsv # gene-level FDR scores from all screens from all screen-batches
Output directory organization and files from global-normalization:
.
├── global_normalization # (Directory) output files generated after running global normalization
│ ├── global_normalized_dLFC_scores.tsv # ***file with normalized dLFC scores from all selected condition screens
│ ├── fdr_scores_all.tsv # file with FDR scores from all selected condition screens
│ ├── scores_all.csv # file with LFC, normalized dLFC, FDR scores, and updated significant hits from all selected condition screens
│ ├── wbc_scores.csv # file with within-between correlation scores after each normalization step
│ ├── sd_scale_table.tsv # standard deviation of dLFC scores before and after scaling step
│ ├── control # (Directory) control screen files
│ │ ├── control # (Directory) control dLFC score file
│ │ │ ├── control_effect_scores.tsv # file with dLFC scores from control screens
│ │ ├── control_control_map_table.tsv # control_control_map table used in generating control dLFC scores
│ │ ├── control_replicates_map_table.tsv # control_replicates_map table used in generating control dLFC scores
│ │ ├── replicate_cor.tsv # Pearson correlation among control screen replicates
│ ├── LDA_evaluation_plots # (Directory) global ROCAUC and per-screen ROCAUC histograms at each LDA component removal step
│ │ ├── bc_lda_<component_number>_histogram.png
│ │ ├── ...
│ │ ├── bc_lda_<component_number>_roc.png
│ │ ├── ...
│ ├── plots # (Directory) scatter plots of gene-level condition LFCs vs control LFCs with negative and positive interactions
│ │ ├── <screen-batch-1>_<condition-screen-1>_vs_<screen-batch-1>_<control-screen-1>_scatter.png
│ │ ├── <screen-batch-1>_<condition-screen-2>_vs_<screen-batch-1>_<control-screen-2>_scatter.png
│ │ ├── <screen-batch-2>_<condition-screen-1>_vs_<screen-batch-2>_<control-screen-1>_scatter.png
│ │ ├── ...
If you use Orobas in your work, please cite:
[Publication placeholder — coming soon]
This project is distributed under the MIT License.