This software models mitochondrial DNA using a modular and extensible object-oriented design. Sequences are loaded from FASTA files via the Parser class and represented as MitochondrialDNA objects. Users can extract statistics (e.g., GC content), search for motifs, and perform sequence alignments. Classes like SequenceAligner and MotifFinder inherit a shared interface from Tool. All components are designed for reuse, extensibility, and clear separation of responsibilities. Specifically, the SequenceComparer class facilitates cross-species analysis by enabling pairwise alignments and comparisons against a reference sequence, providing insights into evolutionary relationships and sequence divergence. The MotifFinder can identify conserved patterns across multiple MitochondrialDNA objects, highlighting functionally important regions. These interactions allow for a comprehensive understanding of genomic data beyond individual sequence analysis.
- CRC Cards
- UML diagram
- Method Documentation
- Web Interface Templates
- Example Input/Output
- Module Dependency Diagram
- Object-Oriented Design Principles
Class: Sequence
Superclass:
Subclass: MitochondrialDNA
Responsibilities:
- Serve as abstract base class for biological sequences
- Enforce implementation of sequence and length properties
- Store raw sequence metadata (
__info) from a DataFrame row
Collaborators:
MitochondrialDNA(subclass)
Class: MitochondrialDNA
Superclass: Sequence
Subclass:
Responsibilities:
- Represent a single mitochondrial DNA sequence
- Implements abstract interface from
Sequence - Store sequence information (ID, name, description, sequence data)
- Calculate GC content
- Extract subsequences
- Identify irregular bases
Collaborators:
Sequence(superclass)Parser(instantiated from data parsed by)FastaManager(manages collections of)MotifFinder,SequenceAligner,SequenceComparer(used by)
Class: Tool
Superclass:
Subclasses: Parser, SequenceAligner, MotifFinder
Responsibilities:
- Abstract superclass for low-level sequence manipulation tools
- Define abstract methods (run, report) to be present in all subclasses
Collaborators:
Parser(subclass)SequenceAligner(subclass)MotifFinder(subclass)
Class: Parser
Superclass: Tool
Subclass:
Responsibilities:
- Parse a SeqIO supported file, default FASTA
- Convert parsed data into pandas DataFrame
- Optionally return MitochondrialDNA objects
- Export data to CSV
- Generate sequence summary reports
Collaborators:
FastaManager(calls Parser)MitochondrialDNA(instantiated from parsed records)Tool(superclass)
Class: SequenceAligner
Superclass: Tool
Subclass:
Responsibilities:
- Align two biological sequences using Needleman-Wunsch (global) or Smith-Waterman (local) algorithms
- Calculate the alignment score
- Calculate the number of matches/mismatches/gaps
- Optionally print the alignment score matrix
- Provide graphical representation of the alignment
Collaborators:
MitochondrialDNA(input type)Tool(superclass)SequenceAlignWrapper(used internally)SequenceComparer(usesSequenceAligner)
Class: MotifFinder
Superclass: Tool
Subclass:
Responsibilities:
- Search for a given motif in a DNA sequence and return its positions
- Discover overrepresented k-mers (motifs) in a sequence above a frequency threshold
- Return and report the last search result (motif matches or discovered motifs)
Collaborators:
Tool(Superclass)MitochondrialDNA(provides input sequences)
Class: FastaManager
Superclass:
Subclass:
Responsibilities:
- Load and manage multiple FASTA datasets, each containing
MitochondrialDNAobjects - Set the currently active dataset
- Provide statistics (GC content, length range, names, count) for the current dataset
Collaborators:
Parser(used to load sequences)MitochondrialDNA(objects managed)
Class: SequenceAlignWrapper
Superclass:
Subclass:
Responsibilities:
- Wrap
SequenceAlignerto simplify usage and expose a cleaner interface - Perform alignment and return alignment data.
Collaborators:
SequenceAligner(used internally)AlignmentVisualizer,SequenceComparer(use this wrapper)
Class: AlignmentVisualizer
Superclass:
Subclass:
Responsibilities:
- Display pairwise alignment between sequences in a formatted, readable layout
- Show alignment symbols, score, matches/mismatches/gaps
Collaborators:
SequenceAlignWrapper(used to perform alignment)MitochondrialDNA(input data)
Class: SequenceComparer
Superclass:
Subclass:
Responsibilities:
- Compare all sequence pairs (or each to a reference) using alignments
- Store and return summary statistics for each comparison
Collaborators:
MitochondrialDNA(input data)SequenceAlignWrapper(used to perform alignment)
| Method/Property | Input | Output |
|---|---|---|
__init__(df) |
df: pandas DataFrame row |
Initializes the abstract base class. |
sequence (abstract property) |
None | The biological sequence: str |
length (abstract property) |
None | Length of the sequence: int |
| Method/Property | Input | Output |
|---|---|---|
__init__(df) |
df: DataFrame row |
A MitochondrialDNA object |
sequence |
None | The DNA sequence: str |
length |
None | Length of the sequence: int |
gc_content |
None | Percentage of GC content: float |
get_subsequence(start, end) |
start: int, end: int |
Subsequence: str |
find_irregular_bases() |
None | List of non-standard bases: list[str] |
name |
None | The name of the sequence: str |
| Method/Property | Input | Output | Description |
|---|---|---|---|
run() |
List[MitochondrialDNA], motif:str or (k:int, threshold:int) |
dict or list |
Searches for specific motifs or discovers k-mers |
get_result() |
— | dict or list |
Returns the last result |
report() |
— | Console print | Prints motif search summary |
| Method | Input | Output | Description |
|---|---|---|---|
__init__() |
None | Instance | Initializes with a SequenceAligner |
align() |
seq1: str, seq2: str, method: str = 'global' |
dict | Aligns two sequences and returns result |
report() |
— | Console | Prints alignment summary |
| Method | Input | Output | Description |
|---|---|---|---|
__init__() |
SequenceAlignWrapper |
Instance | Initializes with alignment wrapper |
display() |
idx1, idx2: int, sequences: List[MitochondrialDNA], method: str = 'global', width: int = 80 |
Console output | Shows formatted alignment |
| Method | Input | Output | Description |
|---|---|---|---|
__init__() |
sequences: List[MitochondrialDNA] |
Instance | Initializes with sequence dataset |
compare_pair() |
idx1, idx2: int |
dict | Aligns two sequences and returns stats |
compare_all() |
— | List[dict] | Performs pairwise comparisons for all sequences |
compare_to_reference() |
ref_index: int = 0 |
List[dict] | Compares each sequence to the reference one |
| Method | Input | Output | Description |
|---|---|---|---|
__init__(format='fasta') |
format: str |
Instance | Initializes parser with file format |
run() |
file_path: str, return_objects: bool = False |
DataFrame or List[MitochondrialDNA] | Parses file and returns data |
save_to_csv() |
output_path: str (optional) |
None | Saves parsed data to CSV |
report() |
print_header: bool = True |
Console output | Prints parsing summary |
| Method | Input | Output | Description |
|---|---|---|---|
run() |
— | Abstract | Must be implemented in subclass |
report() |
— | Abstract | Must be implemented in subclass |
| Method | Input | Output | Description |
|---|---|---|---|
__init__() |
match: int = 2, mismatch: int = -1, gap: int = -2, show_matrix: bool = False |
Instance | Initializes aligner settings |
run() |
seq1, seq2: str, method: str = 'global' |
None | Runs selected alignment algorithm |
get_alignment_data() |
— | dict | Returns dictionary with alignment results |
report() |
width: int = 50, print_alignment: bool = True |
Console output | Displays alignment result |
Note: _global_align(), _local_align(), _traceback() are internal helper methods and usually not exposed in public docs.
| Method | Input | Output | Description |
|---|---|---|---|
__init__() |
— | Instance | Initializes an empty sequence manager |
parse() |
filepath: str |
None | Loads and parses FASTA file |
get_stats() |
— | dict | Returns basic stats like count, min/max/mean length |
get_gc_contents() |
— | List[float] | Returns list of GC content for all sequences |
get_names() |
— | List[str] | Returns sequence names |
get_sequences() |
— | List[MitochondrialDNA] | Returns all sequence objects |
These Jinja2 HTML templates are used to render the front-end of the Flask application.
| Template File | Purpose |
|---|---|
index.html |
Page to upload a FASTA file and trigger parsing |
summary.html |
Shows GC content statistics and related visualizations |
motif.html |
Allows users to search for or discover motifs in sequences |
align.html |
Interface for selecting and aligning two sequences |
compare_reference.html |
Compares all sequences to a selected reference with alignment stats |
>NC_012920.1 Homo sapiens mitochondrion, complete genome
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTT...
>NC_002008.4 Pan troglodytes mitochondrion, complete genome
GAGCCCGTCTAAACTCCTCTATGTGTCTATGTCCTTGCTTTGGCGGTTTAG...
- Motif searched:
ATGCG - Discovered in: 3 sequences
- Positions found:
- Homo sapiens: 102, 350
- Pan troglodytes: 140
Sequences compared: Homo sapiens vs Pan troglodytes
Method: Global (Needleman-Wunsch)
Score: 186
Matches: 91
Mismatches: 22
Gaps: 6
A-TGCGATACCTGGT
| || ||| || ||
AATG-GATAC-TGGT
| Species | GC Content (%) |
|---|---|
| Homo sapiens | 43.21 |
| Pan troglodytes | 44.02 |
Description: Initialize from a pandas DataFrame row.
Parameters:
df(pd.Series): Row containing sequence metadata Returns:MitochondrialDNA: A new instance
Description: Returns the raw DNA sequence.
Returns:
str: The sequence
Description: Returns the sequence length.
Returns:
int: Length of the DNA sequence
Description: Calculates GC content as percentage.
Returns:
float: Percentage of G and C bases
Description: Extracts a subsequence between start and end indices.
Parameters:
start(int): Start index (inclusive)end(int): End index (exclusive) Returns:str: The extracted subsequence Raises:ValueError: If indices are out of range Example:
sub = dna.get_subsequence(10, 50)Description: Returns a list of non-standard bases.
Returns:
list[str]: Bases outside A, T, G, C
Description: Returns the sequence name.
Returns:
str: Name of the DNA record
Description: Initialize alignment scoring system.
Parameters:
match(int): Score for matchesmismatch(int): Score for mismatchesgap(int): Score for gapsshow_matrix(bool): Whether to print the score matrix Returns:SequenceAligner: Instance
Description: Runs alignment using specified method.
Parameters:
seq1(str): First sequenceseq2(str): Second sequencemethod(str): 'global' or 'local' Returns:None: Populates internal result dictionary Raises:ValueError: If method is invalid
Description: Returns the full alignment results.
Returns:
dict: Contains aligned sequences, score, matches
Description: Prints alignment summary and optionally the sequences.
Parameters:
width(int): Line width for printprint_alignment(bool): Whether to print the alignment Returns:None: Outputs to console
Description: Initializes internal motif result storage.
Returns:
MotifFinder: New instance
Description: Searches for a specific motif or discovers conserved motifs.
Parameters:
sequences(List[MitochondrialDNA]): Input sequencesmotif(str): Motif to search (optional)k(int): Length of k-mers (if discovering)threshold(int): Minimum number of sequences a motif must appear in Returns:dict or list: Match results or discovered motifs
Description: Returns the result of the last motif search.
Returns:
dict or list: Cached results
Description: Prints summary of motif analysis.
Returns:
None: Console output
Description: Initializes parser for a supported format.
Parameters:
format(str): BioPython-supported format (default: 'fasta') Returns:Parser: Instance
Description: Parses a sequence file and returns either a DataFrame or list of sequence objects.
Parameters:
file_path(str): Path to the filereturn_objects(bool): Return list of MitochondrialDNA if True Returns:pd.DataFrame or list: Parsed data Raises:FileNotFoundError: If file doesn't exist
Description: Saves parsed data to CSV.
Parameters:
output_path(str): Path to save output (optional) Returns:None: Writes file or prints error
Description: Prints summary of parsed records.
Parameters:
print_header(bool): Whether to print DataFrame head Returns:None: Console output
[
{
"motif": "ATG",
"sequences": [
{
"sequence_index": 0,
"sequence_name": "Homo sapiens",
"positions": [102, 250]
},
{
"sequence_index": 2,
"sequence_name": "Mus musculus",
"positions": [74, 198]
}
]
}
]Description: Initialize comparer with a list of sequences.
Parameters:
sequences(List[MitochondrialDNA]): The sequences to compare Returns:SequenceComparer: Instance
Description: Compares two sequences using alignment.
Parameters:
idx1(int): Index of first sequenceidx2(int): Index of second sequencemethod(str): 'global' or 'local' Returns:dict: Alignment statistics
Description: Compares all unique sequence pairs.
Returns:
List[dict]: Stats for all pairwise alignments
Description: Compares each sequence to a reference.
Parameters:
ref_index(int): Index of reference sequencemethod(str): Alignment method Returns:List[dict]: Stats per comparison
graph TD
app.py --> comparer.py
app.py --> sequence.py
app.py --> tools.py
comparer.py --> sequence.py
comparer.py --> tools.py
tools.py --> sequence.py
This project is structured around modern OOP principles:
Classes like MitochondrialDNA and SequenceAligner encapsulate internal data (e.g., sequences, alignment results), exposing functionality through clean public interfaces.
Sequence and Tool are abstract base classes enforcing essential methods (run, report, etc.) in all subclasses, enabling modular and consistent logic.
MitochondrialDNAinherits fromSequenceMotifFinder,SequenceAligner, andParserall inherit fromTool
This promotes reuse and shared structure across tools.
Generic interfaces (run(), report()) allow tools like Parser, MotifFinder, and SequenceAligner to be used interchangeably in pipelines and the frontend.
