Skip to content

[claude] Avoid Rust-DataFrame round-trips in candidate pipeline#800

Open
GeorgWa wants to merge 1 commit intofeature/zscore-nn-classifier-integrationfrom
feature/rust-fdr-optimizations
Open

[claude] Avoid Rust-DataFrame round-trips in candidate pipeline#800
GeorgWa wants to merge 1 commit intofeature/zscore-nn-classifier-integrationfrom
feature/rust-fdr-optimizations

Conversation

@GeorgWa
Copy link
Copy Markdown
Collaborator

@GeorgWa GeorgWa commented Feb 27, 2026

Summary

  • Keep candidates in Rust-native CandidateCollection through selection, scoring, FDR, and quantification stages
  • Add Rust backend for z-score filtering (zscore_filter_mask) with numpy fallback
  • Add min_fragments_selection config option for fragment count cutoff during selection
  • Remove candidates_to_ng round-trip conversion; only convert to DataFrame at the final merge step

Stacked on #798 → integration PR.

🤖 Generated with Claude Code


PR Stack

…gh pipeline

Keep candidates in Rust-native CandidateCollection format between selection,
scoring, FDR filtering, and quantification stages. Add Rust backend for
z-score filtering and min_fragments_selection config option.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@mschwoer mschwoer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(WiP)


_HAS_RUST_ZSCORE = True
except ImportError:
_HAS_RUST_ZSCORE = False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this branch? alphadia-search-rs is always available .. (you might remove the except ImportError: also from peptidecentric.py

from alphadia.fdr.classifiers import BinaryClassifierLegacyNewBatching, Classifier

try:
from alphadia_search_rs import zscore_filter_mask as _zscore_filter_mask_rs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zscore_filter_mask_rs

Comment on lines +160 to +164
feat = np.nan_to_num(
x[:, zscore_cols].astype(np.float64), nan=0.0, posinf=0.0, neginf=0.0
)
scores = np.sum((feat - p["means"]) / p["stds"] * p["signs"], axis=1)
return scores >= self._threshold
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe leave this in the docstrings as a reference implementation

"""
p = self._zscore_params
if _HAS_RUST_ZSCORE:
return _zscore_filter_mask_rs(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how big is the speed benefit? does it justify the increased complexity here?

all_scores = np.sum((all_feat - means) / stds * signs, axis=1)
survivors = all_scores >= self._threshold
# Score all candidates and filter (uses Rust if available)
survivors = self._zscore_survivors(x, zscore_cols)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these are
not gon' give up?
not gon' stop?
gon' work harder?

) -> pd.DataFrame:
"""Parse candidates from NG to classic format."""

"""Convert CandidateCollection to a DataFrame for merge_candidate_data.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert CandidateCollection to a DataFrame.

Parameters
=========

please adjust your Claude.md:
Keep docstrings of public APIs clearly scoped: e.g. don't add information about where something is called, or what happens with the result."


candidates_df["frame_start"] = candidates_df["frame_start"] * cycle_len
candidates_df["frame_stop"] = candidates_df["frame_stop"] * cycle_len
candidates_df["frame_center"] = candidates_df["frame_center"] * cycle_len
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this cycle_len conversion also be moved to rust? (separate PR)

any way, you could move it to l. 127-129 already now, to save some LoC :-)

Comment on lines +541 to +542
Overrides the base class to keep candidates in Rust-native format,
avoiding Rust→DataFrame→Rust round-trips between pipeline stages.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment describes the current change and the old version, not the intent of the method.

Comment on lines +596 to +606
def _select_candidates(
self,
dia_data: "DiaDataNG", # noqa: F821
spectral_library: SpecLibFlat,
) -> CandidateCollection:
"""Select candidates using NG backend.

return cands
Note: For the NG backend, select_candidates() is overridden, so this
method is only called as fallback.
"""
return self.select_candidates(dia_data, spectral_library, apply_cutoff=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor:

  • in the base class, merge the current implementation of select_candidates() with the implementation of _select_candidates() in ClassicExtractionHandler
  • make select_candidates() abstract in the base class
  • remove _select_candidates() (everywhere)

def quantify_candidates(
self,
candidates_df: pd.DataFrame,
candidates: CandidateCollection,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CandidateCollectionRS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants