Replace the subprocess-based ColabFold integration (separate venv, patched
site-packages, setup.sh) with code inlined directly into the BioEmu package.
Everything now runs in a single Python environment.
## What changed
### Removed
- src/bioemu/colabfold_setup/ (setup.sh, batch.patch, modules.patch)
- Subprocess calls to colabfold_batch
- Separate ColabFold venv (~/.bioemu_colabfold) is no longer needed
### Added: src/bioemu/colabfold_inline/
- msa_client.py: MMseqs2 API client (from colabfold.colabfold, MIT)
- input_parsing.py: FASTA/A3M parser (from colabfold.batch, MIT)
- features.py: Monomer feature pipeline wrapping vendored alphafold
- model_runner.py: AF2 forward pass orchestration, weight downloading
- LICENSES/: ColabFold MIT + AlphaFold2 Apache 2.0 license texts
### Added: src/_vendor/alphafold/
Vendored, patched subset of AlphaFold2 v2.3.2 (Apache 2.0):
- Evoformer and model runner (the forward pass)
- Patched modules.py to expose representations_evo
- Removed: structure module, multimer, relax, templates, data tools
(~13,000 lines of unused code deleted)
- Registered via sys.modules aliasing (no sys.path manipulation)
### Modified
- get_embeds.py: Calls inlined code directly instead of subprocess
- pyproject.toml: JAX, Haiku, ml-collections, TF now required deps;
[cuda] extra for GPU-specific packages
- README.md: Updated install instructions, removed Python 3.12 cap,
clarified conda only needed for optional hpacker
- NOTICE.md, cgmanifest.json: Added ColabFold + AlphaFold2 attribution
### Tests
- 96 tests passing (54 new tests for inlined code)
- GPU regression tests verify embeddings match main branch
(correlation >0.9999, per-residue cosine similarity >0.999)
- Mocks target _run_model (JAX forward pass) only; feature building
runs for real in unit tests
## License compliance
- Vendored AF2 files retain original DeepMind copyright headers
- Modified files carry Apache 2.0 Section 4(b) change notices
- ColabFold-derived files carry MIT attribution headers
- Full license texts in LICENSES/ and src/_vendor/alphafold/LICENSE
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Replace the subprocess-based ColabFold integration (separate venv, patched site-packages, setup.sh) with code inlined directly into the BioEmu package. Everything now runs in a single Python environment.
What changed
Removed
src/bioemu/colabfold_setup/(setup.sh, batch.patch, modules.patch)colabfold_batch~/.bioemu_colabfold) is no longer neededAdded:
src/bioemu/colabfold_inline/Added:
src/_vendor/modules.pyto exposerepresentations_evo. Removed ~13,000 lines of unused code (structure module, multimer, relax, templates, data tools).src/bioemu/openfold/with samesys.modulesaliasing.CI
uv(astral-sh/setup-uv@v4)JAX_PLATFORMS=cpufor CPU-only CI runnersDependencies
[cuda]extra for GPU-specific packages (jax[cuda12], nvcc)Tests
_run_modelonly; feature building runs for realLicense compliance
NOTICE.mdandcgmanifest.jsonupdatedE2E verified on GPU
Produces correct PDB + XTC output with fresh cache (no precomputed embeds or MSAs).