StaphBrowse

A versatile and user-friendly genome browser for Staphylococcus aureus

Hackathon Team: Stuart Brown, Anbo Zhou, Richard Kopin and Jeffrey Vedanayagam

Staphylococcus aureus is the most common cause of human bacterial infections, including the majority of hospital acquired infections. S. aureus has a highly variable genome, with differences between isolates that include substantial insertion/deletions of mobile elements. Disease surveillance research has led to the genome sequencing of many thousands of isolates. However, the annotation of these genome sequences does not provide researchers with a complete set of orthologs with informative names. Here, we present a computational pipeline to compare de novo sequence contigs to the set of complete RefSeq genomes for i) determining appropriate reference genome for whole-genome alignment, ii) annotation, ortholog prediction, and comparative genomics, and iii) front-end visualization of genome annotation using a versatile, user-friendly web-based genome browser. We demonstrate our pipeline using data from S. aureus as a paradigm, owing to its high sequence variability, and therefore less well-curated genomic sequences in public databases.

Dependencies:computer:

pyfasta link

BLAST+ link

MAUVE link

GLIMMER link

JBrowse link

StaphBrowse workflow

Workflow methods

Genome sequence data

Staphyococcus aureus genome sequence data was obtained from NCBI genomes portal, using the search term “staphylococcus aureus[orgn] “. NBCI lists 7968 sequences associated with S. aureus, but only 66 sequences are complete whole-genome sequences. We downloaded 66 complete whole-genome sequences of S. aureus from NCBI.

Obtain near-complete whole-genome sequences from NCBI

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/717/725/

Determining the best reference genome for alignment

As a proof-of-principle, we sampled 10 random genomes from the whole-genome set and subdivided the genomes into 1kb chunks using pyfasta tool. 1kb sequences were then compared with BLAST against a database of the remaining genomes using a custom shell script. The most frequent best-hit to a Reference was identified for each query genome.

A pairwise alignment between the query genome and the closest Reference genome is then constructed with MAUVE Contig Mover and exported as a JPEG image. A table of gaps in the alignment is also provided as a CSV file to aid in the identification of strain-specific insertions of mobile elements, which often carry drug resistance and virulence genes.

Run the best_reference.sh script

staph_pipeline.sh

Output for best alignment can be seen from command-line, but this is implemented as part of our user-friendly browser so unless users are interested in developing, this information is not shown in StaphBrowse web browser.

Ortholog identification

For 50 S. aureus genomes we extracted the protein-coding sequences (CDS) from the gene feature format (gff) file. The CDS fasta files were then converted into individual BLAST databases for performing reciprocal best BLAST search. We required a minimum alignment length of 50 nucleotides with a e-value < 0.001 for a best reciprocal BLAST hit using the ortholog_identification.R script.

JBrowse visualization of Staphylococcus aureus genome

Our welcome page of the brower has options to load the de novo genome of your interest to find the best reference genome for alignment, visualize gene annotations using JBrowse and view orthlog predictions in a table format embedded within the StaphBrowse genome browser.

Example screenshots for StaphBrowse

User page for loading the genome

Based on the best alignment for a reference, users can choose a genome for visualization of gene models

Results from ortholog predictions can be loaded as a table and browsed through StaphBrowse

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
server		server
web		web
.gitignore		.gitignore
Annotate.png		Annotate.png
Best_reciprocal_HITs.R		Best_reciprocal_HITs.R
Gene_models.png		Gene_models.png
Hack_ReadGFF.R		Hack_ReadGFF.R
LICENSE		LICENSE
README.md		README.md
Reciprocal_Blast.R		Reciprocal_Blast.R
StaphBrowse_workflow.png		StaphBrowse_workflow.png
Table.png		Table.png
best_reference.png		best_reference.png
npm-debug.log		npm-debug.log
reference_finder.sh		reference_finder.sh
staph_pipeline.sh		staph_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StaphBrowse

A versatile and user-friendly genome browser for Staphylococcus aureus

Hackathon Team: Stuart Brown, Anbo Zhou, Richard Kopin and Jeffrey Vedanayagam

Dependencies:computer:

StaphBrowse workflow