RiboScanner is a deep learning model that given a 5'UTR sequence, predicts leaky scanning determined by the RiboScan reporter. RiboScan reporter is a highly sensitive reporter system that quantitatively measures start codon recognition with substantially greater dynamic range than conventional fluorescence reporters.
The input sequence should include the putative Translation Initiation Site (TIS) and the surrounding sequence. The model was trained on HEK293 cell data. The RiboScanner was trained on sequences between 30 bp and 130 bp, so we recommend not exceeding this range. Since most of the training sequences contain only one AUG, we also suggest including only one AUG per input sequence. Additionally, most sequences seen by the model contain 17 nucleotides downstream of the AUG.
Optionally, create a new environment for RiboScanner :
conda create -n RiboScanner python=3.9
conda activate RiboScanner
Then, you can install directly from GitHub:
pip install git+https://github.com/luciabarb/RiboScanner.git
To verify that RiboScanner was installed correctly, run the following command. It should display the help message without errors:
RiboScanner --help
To predict GFP levels associated with leaky scanning for each sequence in a tab-separated dataframe, run:
RiboScanner predict \
--input ./example_data/input.txt \
--column_sequence sequence \
--output ./output.txtTo use a FASTA file instead, simply provide it as the argument to --input.
The argument
--column_sequenceshould be the column in your dataframe that includes the sequences to predict. Note that you should replace./example_data/input.txtwith the actual path to the file available on this page.
The output is a tab-separated file.
The first columns are identical to those provided in the input dataframe, followed by the sequence length (length_sequence) and the predicted GFP levels (predictions_GFP).
For the command line above, you should expect the following result:
| id | sequence | length_sequence | predictions_GFP |
|---|---|---|---|
| 1 | ATGGAAAG... | 44 | 11.863073 |
| 2 | ATAAAATA... | 73 | -0.10921967 |
| 3 | AGAAGCC... | 73 | 6.5084333 |
If you make use of RiboScanner model and/or this pipeline, please cite:
Bram M. P. Verhagen, David Liedtke, Lucía Barbadilla-Martínez, Carlos Alvarado, Valentyn Petrychenko, Michał Świrski, Micha D. Müller, Eivind Valen, Joseph D. Puglisi, Jeroen de Ridder, Niels Fischer and Marvin E. Tanenbaum. "Decoding the sequence requirements for translation initiation." (2026)