Skip to content

Add more TreeOfLife Models#171

Draft
egrace479 wants to merge 3 commits intomainfrom
feat/tol-models
Draft

Add more TreeOfLife Models#171
egrace479 wants to merge 3 commits intomainfrom
feat/tol-models

Conversation

@egrace479
Copy link
Copy Markdown
Member

@egrace479 egrace479 commented Apr 3, 2026

Incorporates newer models (BioCAP and BioCLIP 2.5 Huge) as "TreeOfLife Models" for full functionality, e.g., TreeOfLifeClassifier use. This reworks the reading of text embeddings from the datasets, as incorporated in TreeOfLife-10M PR 13.

This update still needs tests and is pending updates to the dataset repositories. Current tests that call the text embeddings will fail until those repos are updated.

Note that TreeOfLife-10M PR 13 and the corresponding BioCLIP update, will change the model weights used when calling the original BioCLIP model with pybioclip. These updated weights are due to a taxonomic fix applied; please see the Hugging Face repositories for more details.

This is further pending an update to TreeOfLife-200M to include txt_emb_bioclip-2.npy and txt_emb_bioclip-2.5-vith14.npy in the embeddings/ directory.

Closes #161.

egrace479 and others added 3 commits March 31, 2026 20:39
reorganizing embeddings folder (e.g., for TOL-10M):
embeddings/
- txt_emb_bioclip.npy
- txt_emb_biocap.npy
- txt_emb_species.npy    # also bioclip txt embedding
- txt_emb_species.json
@egrace479 egrace479 added the enhancement New feature or request label Apr 3, 2026
@hlapp
Copy link
Copy Markdown
Member

hlapp commented Apr 17, 2026

Just FYI, it may be best to redo this due to the changes in #179. You can try rebasing, but that might not go well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Address embedding update with updated models

2 participants