[claude] Avoid Rust-DataFrame round-trips in candidate pipeline#800
[claude] Avoid Rust-DataFrame round-trips in candidate pipeline#800GeorgWa wants to merge 1 commit intofeature/zscore-nn-classifier-integrationfrom
Conversation
…gh pipeline Keep candidates in Rust-native CandidateCollection format between selection, scoring, FDR filtering, and quantification stages. Add Rust backend for z-score filtering and min_fragments_selection config option. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| _HAS_RUST_ZSCORE = True | ||
| except ImportError: | ||
| _HAS_RUST_ZSCORE = False |
There was a problem hiding this comment.
do we need this branch? alphadia-search-rs is always available .. (you might remove the except ImportError: also from peptidecentric.py
| from alphadia.fdr.classifiers import BinaryClassifierLegacyNewBatching, Classifier | ||
|
|
||
| try: | ||
| from alphadia_search_rs import zscore_filter_mask as _zscore_filter_mask_rs |
| feat = np.nan_to_num( | ||
| x[:, zscore_cols].astype(np.float64), nan=0.0, posinf=0.0, neginf=0.0 | ||
| ) | ||
| scores = np.sum((feat - p["means"]) / p["stds"] * p["signs"], axis=1) | ||
| return scores >= self._threshold |
There was a problem hiding this comment.
maybe leave this in the docstrings as a reference implementation
| """ | ||
| p = self._zscore_params | ||
| if _HAS_RUST_ZSCORE: | ||
| return _zscore_filter_mask_rs( |
There was a problem hiding this comment.
how big is the speed benefit? does it justify the increased complexity here?
| all_scores = np.sum((all_feat - means) / stds * signs, axis=1) | ||
| survivors = all_scores >= self._threshold | ||
| # Score all candidates and filter (uses Rust if available) | ||
| survivors = self._zscore_survivors(x, zscore_cols) |
There was a problem hiding this comment.
so these are
not gon' give up?
not gon' stop?
gon' work harder?
| ) -> pd.DataFrame: | ||
| """Parse candidates from NG to classic format.""" | ||
|
|
||
| """Convert CandidateCollection to a DataFrame for merge_candidate_data. |
There was a problem hiding this comment.
Convert CandidateCollection to a DataFrame.
Parameters
=========
please adjust your Claude.md:
Keep docstrings of public APIs clearly scoped: e.g. don't add information about where something is called, or what happens with the result."
|
|
||
| candidates_df["frame_start"] = candidates_df["frame_start"] * cycle_len | ||
| candidates_df["frame_stop"] = candidates_df["frame_stop"] * cycle_len | ||
| candidates_df["frame_center"] = candidates_df["frame_center"] * cycle_len |
There was a problem hiding this comment.
could this cycle_len conversion also be moved to rust? (separate PR)
any way, you could move it to l. 127-129 already now, to save some LoC :-)
| Overrides the base class to keep candidates in Rust-native format, | ||
| avoiding Rust→DataFrame→Rust round-trips between pipeline stages. |
There was a problem hiding this comment.
This comment describes the current change and the old version, not the intent of the method.
| def _select_candidates( | ||
| self, | ||
| dia_data: "DiaDataNG", # noqa: F821 | ||
| spectral_library: SpecLibFlat, | ||
| ) -> CandidateCollection: | ||
| """Select candidates using NG backend. | ||
|
|
||
| return cands | ||
| Note: For the NG backend, select_candidates() is overridden, so this | ||
| method is only called as fallback. | ||
| """ | ||
| return self.select_candidates(dia_data, spectral_library, apply_cutoff=False) |
There was a problem hiding this comment.
Please refactor:
- in the base class, merge the current implementation of
select_candidates()with the implementation of_select_candidates()inClassicExtractionHandler - make
select_candidates()abstract in the base class - remove _select_candidates() (everywhere)
| def quantify_candidates( | ||
| self, | ||
| candidates_df: pd.DataFrame, | ||
| candidates: CandidateCollection, |
Summary
CandidateCollectionthrough selection, scoring, FDR, and quantification stageszscore_filter_mask) with numpy fallbackmin_fragments_selectionconfig option for fragment count cutoff during selectioncandidates_to_nground-trip conversion; only convert to DataFrame at the final merge stepStacked on #798 → integration PR.
🤖 Generated with Claude Code
PR Stack