feat: Inline ColabFold and AlphaFold2 into BioEmu by sarahnlewis · Pull Request #206 · microsoft/bioemu

sarahnlewis · 2026-03-22T21:11:29Z

Summary

Replace the subprocess-based ColabFold integration (separate venv, patched site-packages, setup.sh) with code inlined directly into the BioEmu package. Everything now runs in a single Python environment.

What changed

Removed

src/bioemu/colabfold_setup/ (setup.sh, batch.patch, modules.patch)
Subprocess calls to colabfold_batch
Separate ColabFold venv (~/.bioemu_colabfold) is no longer needed

Added: `src/bioemu/colabfold_inline/`

msa_client.py: MMseqs2 API client (from colabfold.colabfold, MIT)
input_parsing.py: FASTA/A3M parser (from colabfold.batch, MIT)
features.py: Monomer feature pipeline wrapping vendored alphafold
model_runner.py: AF2 forward pass orchestration, weight downloading

Added: `src/_vendor/`

alphafold/: Vendored, patched subset of AlphaFold2 v2.3.2 (Apache 2.0). Patched modules.py to expose representations_evo. Removed ~13,000 lines of unused code (structure module, multimer, relax, templates, data tools).
openfold/: Moved from src/bioemu/openfold/ with same sys.modules aliasing.

CI

Replaced conda with uv (astral-sh/setup-uv@v4)
Test matrix: Python 3.10, 3.11, 3.12, 3.13
JAX_PLATFORMS=cpu for CPU-only CI runners
GPU regression tests skipped in CI (require weights + GPU)

Dependencies

JAX, Haiku, ml-collections, TF now required deps (not optional)
[cuda] extra for GPU-specific packages (jax[cuda12], nvcc)
Python 3.12 upper bound removed

Tests

94 tests passing in CI (+ 2 GPU regression tests on GPU hosts)
GPU regression tests verify embeddings match main branch (correlation >0.9999)
Mocks target _run_model only; feature building runs for real

License compliance

Vendored AF2 files retain original DeepMind copyright + Apache 2.0 Section 4(b) modification notices
ColabFold-derived files carry MIT attribution headers
NOTICE.md and cgmanifest.json updated

E2E verified on GPU

python -m bioemu.sample --sequence GYDPETGTWG --num_samples 2

Produces correct PDB + XTC output with fresh cache (no precomputed embeds or MSAs).

Replace the subprocess-based ColabFold integration (separate venv, patched site-packages, setup.sh) with code inlined directly into the BioEmu package. Everything now runs in a single Python environment. ## What changed ### Removed - src/bioemu/colabfold_setup/ (setup.sh, batch.patch, modules.patch) - Subprocess calls to colabfold_batch - Separate ColabFold venv (~/.bioemu_colabfold) is no longer needed ### Added: src/bioemu/colabfold_inline/ - msa_client.py: MMseqs2 API client (from colabfold.colabfold, MIT) - input_parsing.py: FASTA/A3M parser (from colabfold.batch, MIT) - features.py: Monomer feature pipeline wrapping vendored alphafold - model_runner.py: AF2 forward pass orchestration, weight downloading - LICENSES/: ColabFold MIT + AlphaFold2 Apache 2.0 license texts ### Added: src/_vendor/alphafold/ Vendored, patched subset of AlphaFold2 v2.3.2 (Apache 2.0): - Evoformer and model runner (the forward pass) - Patched modules.py to expose representations_evo - Removed: structure module, multimer, relax, templates, data tools (~13,000 lines of unused code deleted) - Registered via sys.modules aliasing (no sys.path manipulation) ### Modified - get_embeds.py: Calls inlined code directly instead of subprocess - pyproject.toml: JAX, Haiku, ml-collections, TF now required deps; [cuda] extra for GPU-specific packages - README.md: Updated install instructions, removed Python 3.12 cap, clarified conda only needed for optional hpacker - NOTICE.md, cgmanifest.json: Added ColabFold + AlphaFold2 attribution ### Tests - 96 tests passing (54 new tests for inlined code) - GPU regression tests verify embeddings match main branch (correlation >0.9999, per-residue cosine similarity >0.999) - Mocks target _run_model (JAX forward pass) only; feature building runs for real in unit tests ## License compliance - Vendored AF2 files retain original DeepMind copyright headers - Modified files carry Apache 2.0 Section 4(b) change notices - ColabFold-derived files carry MIT attribution headers - Full license texts in LICENSES/ and src/_vendor/alphafold/LICENSE Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-03-22T21:47:13Z

Summary


Generated on:	03/22/2026 - 21:47:12
Parser:	Cobertura
Assemblies:	4
Classes:	27
Files:	27
Line coverage:	85.5% (1836 of 2146)
Covered lines:	1836
Uncovered lines:	310
Coverable lines:	2146
Total lines:	6800
Covered branches:	0
Total branches:	0
Method coverage:	Feature is only available for sponsors

Coverage

src.bioemu - 88.8%

Name	Line	Branch
src.bioemu	88.8%	****
init.py	100%
chemgraph.py	100%
convert_chemgraph.py	97%
denoiser.py	98.1%
get_embeds.py	80.3%
md_utils.py	85.8%
model_utils.py	78%
models.py	94.1%
run_hpacker.py	0%
sample.py	88.3%
sde_lib.py	86.6%
seq_io.py	100%
shortcuts.py	100%
sidechain_relax.py	77.2%
so3_sde.py	91.7%
steering.py	90.7%
structure_module.py	84.3%
utils.py	65.6%

src.bioemu.colabfold_inline - 62%

Name	Line	Branch
src.bioemu.colabfold_inline	62%	****
init.py
features.py	100%
input_parsing.py	100%
model_runner.py	49%
msa_client.py	60.8%

src.bioemu.hpacker_setup - 58.8%

Name	Line	Branch
src.bioemu.hpacker_setup	58.8%	****
init.py
setup_hpacker.py	58.8%

src.bioemu.training - 100%

Name	Line	Branch
src.bioemu.training	100%	****
foldedness.py	100%
loss.py	100%

github-actions · 2026-03-30T09:57:20Z

Summary


Generated on:	03/30/2026 - 09:57:19
Parser:	Cobertura
Assemblies:	4
Classes:	27
Files:	27
Line coverage:	85.5% (1836 of 2146)
Covered lines:	1836
Uncovered lines:	310
Coverable lines:	2146
Total lines:	6800
Covered branches:	0
Total branches:	0
Method coverage:	Feature is only available for sponsors

Coverage

src.bioemu - 88.8%

Name	Line	Branch
src.bioemu	88.8%	****
init.py	100%
chemgraph.py	100%
convert_chemgraph.py	97%
denoiser.py	98.1%
get_embeds.py	80.3%
md_utils.py	85.8%
model_utils.py	78%
models.py	94.1%
run_hpacker.py	0%
sample.py	88.3%
sde_lib.py	86.6%
seq_io.py	100%
shortcuts.py	100%
sidechain_relax.py	77.2%
so3_sde.py	91.7%
steering.py	90.7%
structure_module.py	84.3%
utils.py	65.6%

src.bioemu.colabfold_inline - 62%

Name	Line	Branch
src.bioemu.colabfold_inline	62%	****
init.py
features.py	100%
input_parsing.py	100%
model_runner.py	49%
msa_client.py	60.8%

src.bioemu.hpacker_setup - 58.8%

Name	Line	Branch
src.bioemu.hpacker_setup	58.8%	****
init.py
setup_hpacker.py	58.8%

src.bioemu.training - 100%

Name	Line	Branch
src.bioemu.training	100%	****
foldedness.py	100%
loss.py	100%

sarahnlewis marked this pull request as draft March 22, 2026 21:14

sarahnlewis force-pushed the sarahlewis/inline-colabfold branch from f715a1c to 5f8a7ca Compare March 22, 2026 21:19

sarahnlewis force-pushed the sarahlewis/inline-colabfold branch from 5f8a7ca to aefe217 Compare March 22, 2026 21:31

sarahnlewis requested review from josejimenezluna, ludwigwinkler and nw13slx March 22, 2026 22:48

josejimenezluna marked this pull request as ready for review March 30, 2026 09:12

update version

6c4e2a4

josejimenezluna approved these changes Mar 30, 2026

View reviewed changes

josejimenezluna merged commit 2aa054f into main Mar 30, 2026
7 checks passed

josejimenezluna deleted the sarahlewis/inline-colabfold branch March 30, 2026 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Inline ColabFold and AlphaFold2 into BioEmu#206

feat: Inline ColabFold and AlphaFold2 into BioEmu#206
josejimenezluna merged 2 commits intomainfrom
sarahlewis/inline-colabfold

sarahnlewis commented Mar 22, 2026 •

edited by josejimenezluna

Loading

Uh oh!

github-actions Bot commented Mar 22, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sarahnlewis commented Mar 22, 2026 • edited by josejimenezluna Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Removed

Added: src/bioemu/colabfold_inline/

Added: src/_vendor/

CI

Dependencies

Tests

License compliance

E2E verified on GPU

Uh oh!

github-actions Bot commented Mar 22, 2026

Summary

Coverage

Uh oh!

github-actions Bot commented Mar 30, 2026

Summary

Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sarahnlewis commented Mar 22, 2026 •

edited by josejimenezluna

Loading

Added: `src/bioemu/colabfold_inline/`

Added: `src/_vendor/`