Skip to content

uri0l/bioinformatics-dna-sequence-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bioinformatics-dna-sequence-analysis

Comprehensive DNA Sequence Analysis and Variant Detection

Note: This project is still in development. Features are being added and bugs may be present. Please use at your own discretion, and feel free to contribute!

Overview

This project presents a collection of bioinformatics tools designed for analyzing DNA sequences, detecting genetic variants, and constructing phylogenetic trees to explore evolutionary relationships between species. It reflects both academic and personal learning experiences gained during the early stages of a bachelor's degree in Bioinformatics.

The project includes key bioinformatics tasks, including DNA sequence manipulation, genetic code translation, sequence alignment, variant calling (SNPs, indels), and phylogenetic tree construction. By integrating these components, the project not only serves as a technical portfolio piece but also demonstrates a comprehensive understanding of bioinformatics concepts through the implementation of core algorithms and methods for real-world biological data analysis.

Objectives

The primary objectives of this project are:

  1. DNA Sequence Validation & Manipulation: Implement methods to manipulate and validate DNA sequences, including operations such as reverse, complement, and reverse-complement.
  2. Genetic Code Translation: Translate DNA sequences into amino acid sequences, accounting for reading frames and possible frame shifts.
  3. Sequence Alignment: Implement local and global sequence alignment algorithms (Smith-Waterman, Needleman-Wunsch) to find similarities and differences between DNA sequences.
  4. Variant Calling: Detect genetic variants (SNPs, insertions, deletions) by comparing a sample sequence with a reference genome.
  5. Phylogenetic Tree Construction: Build and visualize phylogenetic trees to explore the evolutionary relationships between different DNA sequences.
  6. Data Visualization: Use plots and graphs to visualize GC content, codon usage, and sequence similarities.

Tools and Libraries Used

  • Python: Primary programming language used to implement bioinformatics algorithms.
  • Jupyter Notebook: For interactive code and results visualization
  • Biopython: A key library for handling biological data, including sequence manipulation, alignment, and phylogenetic tree construction.
  • Matplotlib: Libraries for data visualization and plotting.
  • NumPy: Used for efficient handling of sequences and mathematical operations.

Future Improvements

  • Improved Variant Calling: Integrate with existing bioinformatics tools like GATK (Genome Analysis Toolkit) for more advanced variant detection.
  • Optimization: Refactor code for faster processing of large genomic datasets using parallel computing techniques.

Installation

To run this project locally, follow the steps below:

  1. Clone the repository:
    git clone https://github.com/yourusername/dna-sequence-analysis.git
    
  2. Install dependencies
    pip install -r requirements.txt
    
  3. Launch jupyter notebook
    jupyter notebook
    

About

Bioinformatics personal project to develop previously learned skills, push personal limits and gain experience on project-based working.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors