Skip to content

OpenMM energy changes after Biotite standardizes atom order and converts back to OpenMM topology #892

@devinzxmeng

Description

@devinzxmeng

OpenMM energy changes after Biotite standardizes atom order and converts back to OpenMM topology

Summary

I am using Biotite to standardize atom order:

PDB -> OpenMM Topology -> Biotite AtomArray -> standardize_order()
    -> OpenMM Topology

The bond graph survives the round trip, but two OpenMM-relevant details change:

  1. OpenMM Atom.id is not preserved.
  2. ForceField.createSystem() generates different ordered PeriodicTorsionForce tuples from the round-tripped topology.

The unordered torsion atom sets are identical, but the ordered (a, b, c, d) tuples differ. This changes OpenMM torsion energy even when coordinates are reordered with standardized_idx.

Environment

Python 3.11.15
Linux x86_64
biotite 1.6.0
openmm 8.5.1
numpy 2.4.4

Minimal reproducer

This uses a complete OpenMM-compatible protein PDB with hydrogens.

from collections import Counter

import biotite.structure.info as info
import numpy as np
from biotite.interface import openmm as biotite_openmm
from openmm import PeriodicTorsionForce, Platform, VerletIntegrator, app, unit
from openmm.unit import kilojoule_per_mole

PDB_PATH = "complete_protein_with_hydrogens.pdb"


def build_system(topology):
    ff = app.ForceField("amber14-all.xml", "implicit/obc2.xml")
    return ff.createSystem(
        topology,
        constraints=app.HBonds,
        nonbondedMethod=app.NoCutoff,
        soluteDielectric=1.0,
        solventDielectric=78.5,
    )


def energy_kj(topology, coords_a3_nm):
    system = build_system(topology)
    simulation = app.Simulation(
        topology,
        system,
        VerletIntegrator(0.002 * unit.picoseconds),
        Platform.getPlatformByName("CPU"),
    )
    simulation.context.setPositions(
        np.asarray(coords_a3_nm, dtype=np.float64) * unit.nanometer
    )
    state = simulation.context.getState(getEnergy=True)
    return float(state.getPotentialEnergy().value_in_unit(kilojoule_per_mole))


def torsion_counts(topology, new_to_old=None, sort_atoms=False):
    system = build_system(topology)
    counts = Counter()
    for i in range(system.getNumForces()):
        force = system.getForce(i)
        if not isinstance(force, PeriodicTorsionForce):
            continue
        for j in range(force.getNumTorsions()):
            a, b, c, d, periodicity, phase, k = force.getTorsionParameters(j)
            atoms = (a, b, c, d)
            if new_to_old is not None:
                atoms = tuple(new_to_old[x] for x in atoms)
            if sort_atoms:
                atoms = tuple(sorted(atoms))
            counts[
                (
                    *atoms,
                    int(periodicity),
                    round(float(phase.value_in_unit(unit.radian)), 12),
                    round(float(k.value_in_unit(unit.kilojoule_per_mole)), 12),
                )
            ] += 1
    return counts


pdb = app.PDBFile(PDB_PATH)
orig_top = pdb.topology
atom_array = biotite_openmm.from_topology(orig_top)
standardized_idx = info.standardize_order(atom_array).astype(np.int64, copy=False)
std_top = biotite_openmm.to_topology(atom_array[standardized_idx])

coords = np.asarray(pdb.positions.value_in_unit(unit.nanometer), dtype=np.float64)
orig_energy = energy_kj(orig_top, coords)
std_energy = energy_kj(std_top, coords[standardized_idx])

orig_atoms = list(orig_top.atoms())
std_atoms = list(std_top.atoms())
atom_id_mismatches = [
    (new_i, old_i, orig_atoms[old_i].id, std_atoms[new_i].id)
    for new_i, old_i in enumerate(standardized_idx.tolist())
    if orig_atoms[old_i].id != std_atoms[new_i].id
]

new_to_old = {
    new_i: int(old_i) for new_i, old_i in enumerate(standardized_idx.tolist())
}
orig_ordered = torsion_counts(orig_top)
std_ordered = torsion_counts(std_top, new_to_old=new_to_old)
orig_unordered = torsion_counts(orig_top, sort_atoms=True)
std_unordered = torsion_counts(std_top, new_to_old=new_to_old, sort_atoms=True)

print("counts original:", orig_top.getNumAtoms(), orig_top.getNumResidues(), orig_top.getNumChains(), orig_top.getNumBonds())
print("counts standardized:", std_top.getNumAtoms(), std_top.getNumResidues(), std_top.getNumChains(), std_top.getNumBonds())
print("Atom.id mismatches:", len(atom_id_mismatches))
print("energy delta kJ/mol:", std_energy - orig_energy)
print("torsion counts:", sum(orig_ordered.values()), sum(std_ordered.values()))
print("ordered torsion differences:", sum((orig_ordered - std_ordered).values()), sum((std_ordered - orig_ordered).values()))
print("unordered torsion differences:", sum((orig_unordered - std_unordered).values()), sum((std_unordered - orig_unordered).values()))

Observed on my structure:

counts original:      890 atoms, 57 residues, 1 chain, 899 bonds
counts standardized:  890 atoms, 57 residues, 1 chain, 899 bonds

Atom.id mismatches: 889

torsion counts: original=2959, standardized=2959
ordered torsion tuple differences: missing=670, extra=670
unordered torsion atom-set differences: missing=0, extra=0

Energy check with consistently reordered coordinates:

total energy delta:   about -0.208 kJ/mol
torsion energy delta: about -0.207 kJ/mol

Bond and angle energies match to numerical precision.

Expected behavior

OpenMM force-field energy should be invariant under atom-order standardization
when the coordinates are reordered by the same index.

In other words, to_topology(from_topology(topology)[standardized_idx]) should
preserve enough OpenMM topology information that ForceField.createSystem()
generates an equivalent system after mapping standardized atom indices back to
the original atom indices.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions