A computational framework for deriving and calculating statistical potentials to analyze the interactions between proteins and nucleic acids (DNA/RNA).
This repository contains source code and datasets for generating knowledge-based potentials (or statistical potentials) specific to protein-nucleic acid complexes. These potentials are derived from known structures in the Protein Data Bank (PDB) and can be used for:
- Scoring protein-DNA/RNA docking poses.
- Evaluating the stability of protein-nucleic acid interfaces.
- Predicting binding affinities.
The potentials in this project are derived using the inverse Boltzmann law, converting the observed frequency of contacts between amino acids and nucleotides into energy scores.
To run the scripts in this repository, you will need the following installed:
- Language: Python 3.8+ (or C++ if applicable)
- Libraries:
numpyscipybiopython(for PDB parsing)matplotlib(for plotting potentials)
-
Clone the repository:
git clone https://github.com/Arkoparno/Protein-nucleic-acid-potentials.git cd Protein-nucleic-acid-potentials -
Install dependencies:
pip install -r requirements.txt
(If you do not have a requirements file, install manually using
pip install numpy biopython)
To generate new potential files based on a dataset of PDB structures:
python [train_potentials.py] --input_dir /path/to/pdb_files/ --output potentials.jsonTo score a specific Protein-DNA complex using the derived potentials:
python [score_complex.py] --pdb 1A2B.pdb --potentials potentials.jsonOutput:
Interaction Energy: -45.23 kcal/mol
Interface Residues: ARG-12, LYS-15, DT-4...
Protein-nucleic-acid-potentials/
├── data/ # Sample PDB files or raw datasets
├── src/ # Source code for calculations
│ ├── parser.py # PDB parsing logic
│ ├── potentials.py # Math and statistical logic
│ └── scorer.py # Scoring functions
├── results/ # Output graphs and potential matrices
├── requirements.txt # Python dependencies
└── README.md # Project documentation