Add PyRosetta Interface Metrics#15
Add PyRosetta Interface Metrics#15JoshuaMHardy wants to merge 6 commits intoMSDLLCpapers:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds three new PyRosetta interface metrics to the OVO pipeline for RFdiffusion binder designs: Interface Shape Complementarity, Interface H-bonds count, and Buried Unsatisfied H-bonds. The PR also integrates optional DAlphaBall support for improved SASA calculations.
Changes:
- Added automated detection and configuration of DAlphaBall for rotation-invariant SASA calculations in buried unsatisfied H-bond metrics
- Added three new descriptors (interface_sc, hbonds_int, buns_heavy_ball_1.1D) with appropriate metadata
- Implemented a CLI command (
ovo init dalphaball) for conda users to build DAlphaBall locally
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/build_dalphaball.sh | Bash script to clone, compile, and install DAlphaBall for local conda users |
| ovo/pipelines/pyrosetta-interface-metrics/bin/pyrosetta_interface_metrics.py | Updated to auto-detect DAlphaBall and dynamically configure XML based on availability |
| ovo/pipelines/pyrosetta-interface-metrics/bin/interface_metrics_rfdesign.xml | Enabled BuriedUnsatHbonds filter and added InterfaceAnalyzerMover to calculate new metrics |
| ovo/core/database/descriptors_rfdiffusion.py | Added three new descriptor definitions for the PyRosetta metrics |
| ovo/cli/init.py | Added new dalphaball subcommand to build DAlphaBall locally |
| docs/source/user_guide/installation.md | Added documentation for optional DAlphaBall installation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| install_dir = os.path.abspath(os.path.expanduser(install_dir)) | ||
|
|
||
| # Get path to build script | ||
| build_script = os.path.join(os.path.dirname(__file__), "..", "..", "scripts", "build_dalphaball.sh") |
There was a problem hiding this comment.
I believe this will only work when installed with pip install -e because the scripts folder is not packaged. You can either put the file into ovo/resources/ or even as an inline variable here in the init script.
There was a problem hiding this comment.
| console.print("") | ||
|
|
||
| # Set default install directory to $OVO_HOME/bin | ||
| if install_dir is None: |
There was a problem hiding this comment.
This directory might not be mounted, but I think an elegant solution would be to store this a new bin subfolder in config.reference_files_dir (that field already contains an absolute path).
There was a problem hiding this comment.
| return env_path | ||
|
|
||
| # 2. Check OVO_HOME/bin (where ovo init dalphaball installs it) | ||
| ovo_home = os.environ.get("OVO_HOME") |
There was a problem hiding this comment.
this is tricky because the scheduler job might not see the same env vars. Instead, you can treat this file same as we treat rfdiffusion models and add a dedicated param dalphaball_path: the pipeline's nextflow config file can use params.reference_files_dir to define a default value for it:
https://github.com/MSDLLCpapers/ovo/blob/develop/ovo/pipelines/rfdiffusion-backbone/nextflow.config#L11-L13
There was a problem hiding this comment.
hmm, maybe one downside is that typically these files are required since they are declared as path... maybe if it was a val, it should handle the case when it's missing (the reference files dir will always be mounted so it should work with val as long as it's an absolute path)
There was a problem hiding this comment.
in case of docker, it makes sense to keep using the DALPHABALL_PATH in the block above, so the param would not be used in that case
There was a problem hiding this comment.
what if we included the dalphaball build dependencies to the conda env, and then do it as part of the main.nf script itself when not already present? Like we do for boltz mols folder: https://github.com/MSDLLCpapers/ovo/blob/develop/ovo/pipelines/boltz/main.nf#L36-L47
we already have similar logic for conda in the fastrelax script: https://github.com/MSDLLCpapers/ovo/blob/develop/ovo/pipelines/proteinmpnn-fastrelax/main.nf#L25-L34
it adds extra stuff to ./site-packages/ of that env, we could actually do the same with dalphaball.gcc, perhaps just to another subfolder... so then it wouldn't have to be a nextflow param :)
that would mean dalphaball would be required which would simplify a lot of the logic (at the cost of debugging any issues people get with installing dalphaball on the fly, but it seems like they should be rare)
There was a problem hiding this comment.
I agree this is the best solution - I did not see those checks in main.nf for missing files. I have not encountered any issues with dalphaball on the fly, as long as it was compiled correctly.
| # Adjust dalphaball_sasa based on availability | ||
| if has_dalphaball: | ||
| # Change dalphaball_sasa="false" to dalphaball_sasa="true" to enable rotation-invariant SASA | ||
| xml_content = xml_content.replace('dalphaball_sasa="false"', 'dalphaball_sasa="true"') |
There was a problem hiding this comment.
might be worth adding an assert here that makes sure that the string is found, to avoid issues if someone changes the xml in the future
| PYROSETTA_INTERFACE_DESCRIPTORS = [PYROSETTA_DDG, PYROSETTA_CMS, PYROSETTA_SAP_SCORE] | ||
| PYROSETTA_BUNS = NumericGlobalDescriptor( | ||
| name="Buried Unsat H-bonds", | ||
| description="Buried unsatisfied hydrogen bonds at the interface (ddG-style). Uses DAlphaBall rotation-invariant SASA if available (via 'ovo init dalphaball'), otherwise standard SASA. Calculated as bound - unbound buried unsats. Lower is better.", |
There was a problem hiding this comment.
a cherry on top would be to use slightly different fields when dalphaball was or wasn't being used and then have two versions of the descriptor object, because without that, the description might be incorrect
one downside is that the UI would also need to handle these two versions, but perhaps the dalphaball version could be considered the primary supported one
|
Thanks for the thorough implementation, I think this approach will work great! I added a few comments, please let me know your thoughts. |
|
|
||
| set -euo pipefail | ||
|
|
||
| INSTALL_DIR="${1:-${OVO_HOME:-$HOME/ovo}/bin}" |
There was a problem hiding this comment.
please make this a required arg to avoid duplicate logic for setting the default path to bin dir
Minor fixes to pull request Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Adds hydrogen bond counts to PyRosetta interface calculations
Issue: #10 Integrate additional PyRosetta interface scoring metrics
This pull request adds three metrics to the Ovo pipeline for RFdiffusion:
I am interested to hear your thoughts on the implementation. Another PyRosetta metric listed in a TODO comment requires PSIPRED and I think figuring out how to integrate PyRosetta dependencies will be useful moving forwards (although hopefully they are easier than DAlphaBall).
Checklist
just format)🧩 Screenshots / Logs (if applicable)
Attach images, logs, or console outputs that help reviewers.