Introduction

RiboScanner is a deep learning model that given a 5'UTR sequence, predicts leaky scanning determined by the RiboScan reporter. RiboScan reporter is a highly sensitive reporter system that quantitatively measures start codon recognition with substantially greater dynamic range than conventional fluorescence reporters.

The input sequence should include the putative Translation Initiation Site (TIS) and the surrounding sequence. The model was trained on HEK293 cell data. The RiboScanner was trained on sequences between 30 bp and 130 bp, so we recommend not exceeding this range. Since most of the training sequences contain only one AUG, we also suggest including only one AUG per input sequence. Additionally, most sequences seen by the model contain 17 nucleotides downstream of the AUG.

Installation

Optionally, create a new environment for RiboScanner :

conda create -n RiboScanner python=3.9
conda activate RiboScanner

Then, you can install directly from GitHub:

pip install git+https://github.com/luciabarb/RiboScanner.git

To verify that RiboScanner was installed correctly, run the following command. It should display the help message without errors:

RiboScanner --help

Usage examples

Predicting leaky scanning

To predict GFP levels associated with leaky scanning for each sequence in a tab-separated dataframe, run:

RiboScanner predict \
 --input ./example_data/input.txt \
 --column_sequence sequence \
--output ./output.txt

To use a FASTA file instead, simply provide it as the argument to --input.

The argument --column_sequence should be the column in your dataframe that includes the sequences to predict. Note that you should replace ./example_data/input.txt with the actual path to the file available on this page.

The output is a tab-separated file. The first columns are identical to those provided in the input dataframe, followed by the sequence length (length_sequence) and the predicted GFP levels (predictions_GFP).

For the command line above, you should expect the following result:

id	sequence	length_sequence	predictions_GFP
1	ATGGAAAG...	44	11.863073
2	ATAAAATA...	73	-0.10921967
3	AGAAGCC...	73	6.5084333

Citation

If you make use of RiboScanner model and/or this pipeline, please cite:

Bram M. P. Verhagen, David Liedtke, Lucía Barbadilla-Martínez, Carlos Alvarado, Valentyn Petrychenko, Michał Świrski, Micha D. Müller, Eivind Valen, Joseph D. Puglisi, Jeroen de Ridder, Niels Fischer and Marvin E. Tanenbaum. "Decoding the sequence requirements for translation initiation." (2026)

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
RiboScanner		RiboScanner
example_data		example_data
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Installation

Usage examples

Predicting leaky scanning

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

Usage examples

Predicting leaky scanning

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages