Adds `image_features` parameter to predict for pre-computed embeddings by NetZissou · Pull Request #169 · Imageomics/pybioclip

NetZissou · 2026-03-25T13:45:28Z

Disclaimer: This PR was developed with assistance from Claude Opus 4.6 (1M context). The author has reviewed all code changes and test additions. CI has been executed successfully in the forked repo. Opening this PR to request review from the package maintainers for further feedback and iteration.

Summary

This PR adds an optional image_features parameter to predict() on TreeOfLifeClassifier and CustomLabelsClassifier (CustomLabelsBinningClassifier inherits this through CustomLabelsClassifier). When provided, the method skips image encoding and computes classification directly from pre-computed embeddings.

Embedding validation

The method validates input embeddings before classification:

Verifies tensor is 2D (N, embedding_dim)
Checks that embedding_dim matches the model's expected dimension (model.visual.output_dim)
Normalizes the embedding vector via L2 norm only if not already normalized

Test plan

New tests in TestPredictFromEmbeddings:

Results from embeddings match results from images exactly (all fields except file_name)
Species, family, and multi-image predictions
Unnormalized features auto-normalized with correct classifications
CustomLabelsClassifier and CustomLabelsBinningClassifier
Error cases: no inputs, wrong tensor dim, wrong embedding dim, image-embedding length mismatch

Closes #167

Allows passing pre-computed image embeddings directly to predict() on TreeOfLifeClassifier, CustomLabelsClassifier, and CustomLabelsBinningClassifier, avoiding redundant image encoding when embeddings are already available. Validates input: checks tensor is 2D, embedding_dim matches the model's expected dimension (model.visual.output_dim), and normalizes via L2 norm only if not already normalized to avoid floating point drift. Closes Imageomics#167 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hlapp

Thanks @NetZissou. The part of the implementation approach that I don't like here is that now creating probabilities is taking place redundantly in two different functions. This also creates more code noise than I think should be needed in the predict() method.

Instead, shouldn't the clean way to handle this in the predict() method be to see whether image_features are already provided. If they are, apply basic checks like correct dimensions etc. If they are not, create them (like the are being created now from images). Then proceed with creating probabilities from image_features.

hlapp · 2026-04-17T23:08:06Z

@NetZissou just FYI, it might be advisable to rebase on main to bring in the changes from #179. It's well possible your changes so far are not in conflict at all, but some stuff did get moved around.

hlapp requested changes Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds `image_features` parameter to predict for pre-computed embeddings#169

Adds `image_features` parameter to predict for pre-computed embeddings#169
NetZissou wants to merge 1 commit intoImageomics:mainfrom
NetZissou:feature/predict-from-embeddings

NetZissou commented Mar 25, 2026

Uh oh!

hlapp left a comment

Uh oh!

hlapp commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NetZissou commented Mar 25, 2026

Summary

Embedding validation

Test plan

Uh oh!

hlapp left a comment

Choose a reason for hiding this comment

Uh oh!

hlapp commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hlapp commented Apr 17, 2026 •

edited

Loading