[claude] Avoid Rust-DataFrame round-trips in candidate pipeline by GeorgWa · Pull Request #800 · MannLabs/alphadia

GeorgWa · 2026-02-27T22:47:34Z

Summary

Keep candidates in Rust-native CandidateCollection through selection, scoring, FDR, and quantification stages
Add Rust backend for z-score filtering (zscore_filter_mask) with numpy fallback
Add min_fragments_selection config option for fragment count cutoff during selection
Remove candidates_to_ng round-trip conversion; only convert to DataFrame at the final merge step

Stacked on #798 → integration PR.

🤖 Generated with Claude Code

PR Stack

alphadia-search-rs: #118 → #119
alphadia: #798 → #799 → #800

…gh pipeline Keep candidates in Rust-native CandidateCollection format between selection, scoring, FDR filtering, and quantification stages. Add Rust backend for z-score filtering and min_fragments_selection config option. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mschwoer

(WiP)

mschwoer · 2026-03-03T10:27:29Z

+
+    _HAS_RUST_ZSCORE = True
+except ImportError:
+    _HAS_RUST_ZSCORE = False


do we need this branch? alphadia-search-rs is always available .. (you might remove the except ImportError: also from peptidecentric.py

mschwoer · 2026-03-03T10:28:41Z

 from alphadia.fdr.classifiers import BinaryClassifierLegacyNewBatching, Classifier

+try:
+    from alphadia_search_rs import zscore_filter_mask as _zscore_filter_mask_rs


zscore_filter_mask_rs

mschwoer · 2026-03-03T10:29:11Z

+        feat = np.nan_to_num(
+            x[:, zscore_cols].astype(np.float64), nan=0.0, posinf=0.0, neginf=0.0
+        )
+        scores = np.sum((feat - p["means"]) / p["stds"] * p["signs"], axis=1)
+        return scores >= self._threshold


maybe leave this in the docstrings as a reference implementation

mschwoer · 2026-03-03T10:29:38Z

+        """
+        p = self._zscore_params
+        if _HAS_RUST_ZSCORE:
+            return _zscore_filter_mask_rs(


how big is the speed benefit? does it justify the increased complexity here?

mschwoer · 2026-03-03T10:30:47Z

-        all_scores = np.sum((all_feat - means) / stds * signs, axis=1)
-        survivors = all_scores >= self._threshold
+        # Score all candidates and filter (uses Rust if available)
+        survivors = self._zscore_survivors(x, zscore_cols)


so these are
not gon' give up?
not gon' stop?
gon' work harder?

mschwoer · 2026-03-03T10:31:13Z

 ) -> pd.DataFrame:
-    """Parse candidates from NG to classic format."""
-
+    """Convert CandidateCollection to a DataFrame for merge_candidate_data.


Convert CandidateCollection to a DataFrame. Parameters =========

please adjust your Claude.md:
Keep docstrings of public APIs clearly scoped: e.g. don't add information about where something is called, or what happens with the result."

mschwoer · 2026-03-03T10:36:02Z


    candidates_df["frame_start"] = candidates_df["frame_start"] * cycle_len
    candidates_df["frame_stop"] = candidates_df["frame_stop"] * cycle_len
    candidates_df["frame_center"] = candidates_df["frame_center"] * cycle_len


could this cycle_len conversion also be moved to rust? (separate PR)

any way, you could move it to l. 127-129 already now, to save some LoC :-)

mschwoer · 2026-03-03T10:40:10Z

+        Overrides the base class to keep candidates in Rust-native format,
+        avoiding Rust→DataFrame→Rust round-trips between pipeline stages.


This comment describes the current change and the old version, not the intent of the method.

mschwoer · 2026-03-03T10:43:42Z

+    def _select_candidates(
+        self,
+        dia_data: "DiaDataNG",  # noqa: F821
+        spectral_library: SpecLibFlat,
+    ) -> CandidateCollection:
+        """Select candidates using NG backend.

-        return cands
+        Note: For the NG backend, select_candidates() is overridden, so this
+        method is only called as fallback.
+        """
+        return self.select_candidates(dia_data, spectral_library, apply_cutoff=False)


Please refactor:

in the base class, merge the current implementation of select_candidates() with the implementation of _select_candidates() in ClassicExtractionHandler

make select_candidates() abstract in the base class

remove _select_candidates() (everywhere)

mschwoer · 2026-03-03T10:43:59Z

    def quantify_candidates(
        self,
-        candidates_df: pd.DataFrame,
+        candidates: CandidateCollection,


CandidateCollectionRS?

mschwoer reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[claude] Avoid Rust-DataFrame round-trips in candidate pipeline#800

[claude] Avoid Rust-DataFrame round-trips in candidate pipeline#800
GeorgWa wants to merge 1 commit intofeature/zscore-nn-classifier-integrationfrom
feature/rust-fdr-optimizations

GeorgWa commented Feb 27, 2026 •

edited

Loading

Uh oh!

mschwoer left a comment

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

mschwoer Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Overrides the base class to keep candidates in Rust-native format,
		avoiding Rust→DataFrame→Rust round-trips between pipeline stages.

Conversation

GeorgWa commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PR Stack

Uh oh!

mschwoer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GeorgWa commented Feb 27, 2026 •

edited

Loading