Skip to content

Add PyRosetta Interface Metrics#15

Open
JoshuaMHardy wants to merge 6 commits intoMSDLLCpapers:developfrom
JoshuaMHardy:pyrosettametrics
Open

Add PyRosetta Interface Metrics#15
JoshuaMHardy wants to merge 6 commits intoMSDLLCpapers:developfrom
JoshuaMHardy:pyrosettametrics

Conversation

@JoshuaMHardy
Copy link
Copy Markdown

@JoshuaMHardy JoshuaMHardy commented Jan 22, 2026

Adds hydrogen bond counts to PyRosetta interface calculations

Issue: #10 Integrate additional PyRosetta interface scoring metrics


This pull request adds three metrics to the Ovo pipeline for RFdiffusion:

  • Interface Shape Complementarity (interface_sc). This was being calculated but was not in the descriptors.
  • Interace H-bonds (hbonds_int) - Hydrogen bonds at interface calculated by InterfaceAnalyzerMetrics mover (as done in BindCraft pipeline). This mover also calculates other metrics (listed in /ovo/pipelines/pyrosetta-interface-metrics/bin/interface_metrics_rfdesign.xml), many of which are redundant with current metrics but can be externalised if desired.
  • Buried Unsat H-bonds (buns_heavy_ball_1.1D) - Unsaturated hydrogen bonds at the interface. This can be calculated with or without an external program 'DAlphaBall'. Although it improves accuracy of the calculation, integrating DAlphaBall adds complexity. I've describe my implementation below.
  1. The use of DAlphaBall in the calculation is controlled by 'dalphaball_sasa' in the Rosetta XML script. By default, this is false and the metric will be calculated without it. If a compiled DAlphaBall binary is provided, this will be switched to true by rewriting the XML file.
  2. DAlphaBall is not portable requiring compilation with host libraries. My attempts at a statically linked binary failed. I have made a bash script (scripts/build_dalphaball.sh) called by a new command 'ovo init dalphaball' that users can use to compile it to $OVO_HOME/bin, where it will be found by the pyrosetta python script. For containers it would be best to compile DAlphaBall inside the container, as I found binding the path resulted in incompatibilities with container libraries. My documentation updates assume container integration. I'm not sure how the containers are updated but I would suggest something like the following:
# Dependencies (including gfortran/libgmp for DAlphaBall)
RUN apt-get update \
    && apt-get install -y bzip2 gfortran libgmp-dev \
    && apt-get clean && rm -rf /var/cache/apt/lists /var/lib/apt/lists

# Install DAlphaBall (for improved BuriedUnsatHbonds SASA calculations)
RUN cd /tmp \
    && git clone --depth 1 https://github.com/outpace-bio/DAlphaBall.git \
    && cd DAlphaBall/src \
    && FC=gfortran CC=gcc make \
    && mkdir -p /opt/dalphaball \
    && cp DAlphaBall.gcc /opt/dalphaball/ \
    && cd / && rm -rf /tmp/DAlphaBall
ENV DALPHABALL_PATH=/opt/dalphaball/DAlphaBall.gcc

I am interested to hear your thoughts on the implementation. Another PyRosetta metric listed in a TODO comment requires PSIPRED and I think figuring out how to integrate PyRosetta dependencies will be useful moving forwards (although hopefully they are easier than DAlphaBall).

Checklist

  • [ X ] Formatting is applied (just format)
  • [ X ] Linked to an issue above

🧩 Screenshots / Logs (if applicable)

Attach images, logs, or console outputs that help reviewers.

Copilot AI review requested due to automatic review settings January 22, 2026 02:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds three new PyRosetta interface metrics to the OVO pipeline for RFdiffusion binder designs: Interface Shape Complementarity, Interface H-bonds count, and Buried Unsatisfied H-bonds. The PR also integrates optional DAlphaBall support for improved SASA calculations.

Changes:

  • Added automated detection and configuration of DAlphaBall for rotation-invariant SASA calculations in buried unsatisfied H-bond metrics
  • Added three new descriptors (interface_sc, hbonds_int, buns_heavy_ball_1.1D) with appropriate metadata
  • Implemented a CLI command (ovo init dalphaball) for conda users to build DAlphaBall locally

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/build_dalphaball.sh Bash script to clone, compile, and install DAlphaBall for local conda users
ovo/pipelines/pyrosetta-interface-metrics/bin/pyrosetta_interface_metrics.py Updated to auto-detect DAlphaBall and dynamically configure XML based on availability
ovo/pipelines/pyrosetta-interface-metrics/bin/interface_metrics_rfdesign.xml Enabled BuriedUnsatHbonds filter and added InterfaceAnalyzerMover to calculate new metrics
ovo/core/database/descriptors_rfdiffusion.py Added three new descriptor definitions for the PyRosetta metrics
ovo/cli/init.py Added new dalphaball subcommand to build DAlphaBall locally
docs/source/user_guide/installation.md Added documentation for optional DAlphaBall installation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ovo/cli/init.py Outdated
Comment thread ovo/core/database/descriptors_rfdiffusion.py Outdated
Comment thread ovo/cli/init.py
install_dir = os.path.abspath(os.path.expanduser(install_dir))

# Get path to build script
build_script = os.path.join(os.path.dirname(__file__), "..", "..", "scripts", "build_dalphaball.sh")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will only work when installed with pip install -e because the scripts folder is not packaged. You can either put the file into ovo/resources/ or even as an inline variable here in the init script.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread ovo/cli/init.py
console.print("")

# Set default install directory to $OVO_HOME/bin
if install_dir is None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory might not be mounted, but I think an elegant solution would be to store this a new bin subfolder in config.reference_files_dir (that field already contains an absolute path).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return env_path

# 2. Check OVO_HOME/bin (where ovo init dalphaball installs it)
ovo_home = os.environ.get("OVO_HOME")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is tricky because the scheduler job might not see the same env vars. Instead, you can treat this file same as we treat rfdiffusion models and add a dedicated param dalphaball_path: the pipeline's nextflow config file can use params.reference_files_dir to define a default value for it:
https://github.com/MSDLLCpapers/ovo/blob/develop/ovo/pipelines/rfdiffusion-backbone/nextflow.config#L11-L13

Copy link
Copy Markdown
Collaborator

@prihoda prihoda Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe one downside is that typically these files are required since they are declared as path... maybe if it was a val, it should handle the case when it's missing (the reference files dir will always be mounted so it should work with val as long as it's an absolute path)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of docker, it makes sense to keep using the DALPHABALL_PATH in the block above, so the param would not be used in that case

Copy link
Copy Markdown
Collaborator

@prihoda prihoda Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we included the dalphaball build dependencies to the conda env, and then do it as part of the main.nf script itself when not already present? Like we do for boltz mols folder: https://github.com/MSDLLCpapers/ovo/blob/develop/ovo/pipelines/boltz/main.nf#L36-L47

we already have similar logic for conda in the fastrelax script: https://github.com/MSDLLCpapers/ovo/blob/develop/ovo/pipelines/proteinmpnn-fastrelax/main.nf#L25-L34

it adds extra stuff to ./site-packages/ of that env, we could actually do the same with dalphaball.gcc, perhaps just to another subfolder... so then it wouldn't have to be a nextflow param :)

that would mean dalphaball would be required which would simplify a lot of the logic (at the cost of debugging any issues people get with installing dalphaball on the fly, but it seems like they should be rare)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is the best solution - I did not see those checks in main.nf for missing files. I have not encountered any issues with dalphaball on the fly, as long as it was compiled correctly.

# Adjust dalphaball_sasa based on availability
if has_dalphaball:
# Change dalphaball_sasa="false" to dalphaball_sasa="true" to enable rotation-invariant SASA
xml_content = xml_content.replace('dalphaball_sasa="false"', 'dalphaball_sasa="true"')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth adding an assert here that makes sure that the string is found, to avoid issues if someone changes the xml in the future

PYROSETTA_INTERFACE_DESCRIPTORS = [PYROSETTA_DDG, PYROSETTA_CMS, PYROSETTA_SAP_SCORE]
PYROSETTA_BUNS = NumericGlobalDescriptor(
name="Buried Unsat H-bonds",
description="Buried unsatisfied hydrogen bonds at the interface (ddG-style). Uses DAlphaBall rotation-invariant SASA if available (via 'ovo init dalphaball'), otherwise standard SASA. Calculated as bound - unbound buried unsats. Lower is better.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a cherry on top would be to use slightly different fields when dalphaball was or wasn't being used and then have two versions of the descriptor object, because without that, the description might be incorrect

one downside is that the UI would also need to handle these two versions, but perhaps the dalphaball version could be considered the primary supported one

@prihoda
Copy link
Copy Markdown
Collaborator

prihoda commented Jan 24, 2026

Thanks for the thorough implementation, I think this approach will work great! I added a few comments, please let me know your thoughts.


set -euo pipefail

INSTALL_DIR="${1:-${OVO_HOME:-$HOME/ovo}/bin}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this a required arg to avoid duplicate logic for setting the default path to bin dir

Minor fixes to pull request

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants