How different is the Spike protein in the first SARS-CoV-2 reference viral genomic sequence data from the Spike protein in SARS-CoV-2 in bats and the latest sequenced SARS-CoV-2 in humans? You will need the following files to run the code:
- The Jupyter Notebook
- Reference SARS-CoV-2 2019 DNA Spike Protein Sequence
- Bat SARS-CoV-2 DNA Spike Protein Sequence
- Human SARS-CoV-2 2021 DNA Spike Protein Sequence
- Reference SARS-CoV-2 2019 Spike Protein Sequence
- Bat SARS-CoV-2 Spike Protein Sequence
- Human SARS-CoV-2 2021 Spike Protein Sequence
Information about where the FASTA files were found: Different strains of SARS-CoV-2 have been genotyped and the NCBI database keeps logs of the raw genome, along with annotations of coding sequences and specific proteins in the genome that are of importance. The database filled with all sequenced SARS-CoV-2 can be found here: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=Wuhan%20seafood%20market%20pneumonia%20virus,%20taxid:2697049.
The output of this file is a sequence alignment and a triple-plotted bar chart.