Paarth Parekh, Danielle Temples, Kara Keun Lee, Shuting Lin
Conda is recommended way of installing tools
Python and Perl is needed to run this pipeline
- GeneMarkS-2 (Locally in the path)
- Prodigal
- Blast+
- bedtools
- samtools
- transeq
- aragon
- Infernal
- rnammer
Please Run the backbone script. It takes in the following inputs:
- INPUT_OPTION: Default Option is 1, options available 1 Take input from the genome assembly results 2 Input your own assembly files
- NAME_CONTIGS: Name of the contig files,called when option 2 for input-option is selected, default considered is contigs.fasta
- INPUT_ASSEMBLY: Path to the directory that contains input file manually,called when option 2 for input-option is selected
- INPUT_FILES77: Path to the directory that contains input file for genome assembly output of 21,33,55,77, called when default option for input-option is selected
- INPUT_FILES99: Path to the directory that contains input file for genome assembly output of 21,33,55,77,99,127,called when default option for input-option is selected
- GENE_OUTPUTS: Path to a directory that will store the output gff files, fna files and faa files.
- CODING_TOOLS: Default Option is 3, options available to run are 1 Only GeneMarkS-2 2 Only Prodigal 3 Both and getting a union of the genes
- TYPE_SPECIES: if running Gene_MarkS-2, mention species to be either bacteria or auto
- NONCODING_TOOLS: Default Option is 2, options available to run are 1 Only Infernal 2 Aragon, Infernal and Rnammer (and merge results
This Gene Prediction Pipeline runs GeneMarkS-2 and/or Prodigal to predict the coding genes in the assembled fasta files and blasts it against the reference database to validate the results and gives output as Known gene or Unknown Gene. If both GeneMarkS-2 and Prodigal are called then it merges the results and then blasts the results to give renamed .fna, .faa and .gff files.
For Noncoding Prediction it uses Infernal and/or Infernal,Aragon and Rnammer to predict the ncRNA. If all three tools are called it merges the results and gives the merged gff files.