DTW DynaCLR Monorepo by edyoshikun · Pull Request #398 · mehta-lab/VisCy

edyoshikun · 2026-03-31T20:44:29Z

No description provided.

…thods (i.e phate, pca,umap)

Add normalization columns (norm_mean/std/median/iqr/max/min), z_focus_mean, and TCZYX shape columns to the cell index schema. preprocess_cell_index reads per-FOV zattrs and writes stats as parquet columns for fast per-row normalization at training time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- ExperimentRegistry.from_cell_index: build registry directly from preprocessed parquet + zarr metadata (no collection YAML needed) - datamodule: cell_index_path as primary entry point, _train_final_crop changed from BatchedRandSpatialCropd to BatchedCenterSpatialCropd (random crop for Z/XY translation is now a user-configured augmentation) - dataset: read norm stats from parquet columns, build_norm_meta fallback - index: _align_parquet_columns, _resolve_dims from parquet Y/X_shape Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- DynaCLR-3D-BagOfChannels-v2: z_window=32, yx_patch=256, RandSpatialCrop(40,228,228) after affine for Z focus invariance + XY translation, CenterCrop(32,160,160) auto-appended. batch_size=256, 2 GPUs, 2-day wall time. - Add dataloader_demo.py: Jupyter-style visualization of raw vs augmented anchor/positive batches with per-sample metadata - Update demo configs and inspection scripts for new pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

np.nanmin/nanmax fail on scipy sparse arrays. Convert to dense before computing range stats so the command works on Seurat-exported anndata zarr stores. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- CLI for running evals - DAG for evals - yaml files for evals

… 3 base callbacks - model/contrastive_encoder_convnext_tiny.yml: ConvNeXt-Tiny class_paths - model/dinov3_frozen_mlp.yml: frozen DINOv3 + MLP projection block - augmentations/ops_2d_mild.yml: OPS-specific mild augmentation pipeline - data/ops_gene_reporter.yml: OPS data defaults (patch sizes, sampling)

- train_linear_classifier() now returns a third value: raw val outputs (y_val, y_val_proba, classes) for downstream ROC curve plotting - orchestrated run-linear-classifiers generates metrics_summary.pdf alongside the CSV: bar chart of AUROC/accuracy/F1 + per-task ROC curves - Delete evaluate_dataset.py (argparse-based, not in CLI, superseded by orchestrator) and its example config - Strip generate_comparison_report and its helpers from report.py; file is now CV-only - Remove dead _detect_n_features() from cross_validation.py - Update all callers of train_linear_classifier() to unpack 3-tuple - Update DAG doc and linear classifiers README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- FOVRecord.channel_markers: dict[str, str] maps zarr channel name to marker for a specific well (populated from Airtable channel_N_marker fields) - ChannelEntry.wells: list[str] restricts a channel to a subset of wells; empty means valid in all wells - build_collection auto-populates wells by comparing which wells have a non-None marker for each channel across all FOVRecords - _build_experiment_tracks skips channel rows where ch.wells is non-empty and the current well is not in that set, preventing noise rows from mixed-plate experiments (e.g. viral sensor only in B/3, C/2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The glob */*/* on zarr v3 stores yields zarr.json files (e.g. A/2/zarr.json) in addition to position directories. The previous check only stripped names starting with "." (.zattrs, .zgroup) but missed zarr.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ollection - DynaCLR-2D-MIP-BagOfChannels: add viral_sensor + Phase3D for 2025_01_28, 2024_10_09, 2024_10_16; fix dragonfly tracks_path to point to inner zarr store (tracking.zarr/2024_08_14_...zarr) - DynaCLR-3D-BagOfChannels-v2: add viral_sensor + Phase3D for 2025_01_28, 2024_10_09, 2024_10_16 - DynaCLR-3D-BagOfChannels-v3: new collection copied from v2 with dragonfly tracks_path fix; v2 left intact for running training job - DynaCLR-BoC-lc-evaluation-v1: add viral_sensor for all datasets; add Phase3D for 2025_01_28 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Wire load_config to delegate to load_composed_config so eval configs support base: recipe inheritance (same mechanism as training configs) - Extract shared eval settings into 4 recipes: predict.yml, reduce.yml, plot_infectomics.yml, linear_classifiers_infectomics.yml - Slim down DynaCLR-2D-BagOfChannels-v3, DynaCLR-2D-MIP-BagOfChannels-v1, DINOv3-temporal-MLP-2D-BagOfChannels-v1, and test_evaluation configs to use base: references — eliminating copy-pasted 14-experiment annotation blocks and shared step configs - Fix ONNX inference to use GPU (CUDAExecutionProvider) and suppress pthread_setaffinity_np noise with intra/inter_op_num_threads=1 - Switch CTC tracking SLURM script to gpu partition Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix \bbf[\b_] -> \bbf(\b|_): inside a character class, \b is a backspace character, not a word boundary - Add \bphc\b to detect phase-contrast (PhC) as label-free Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pandas 3+ uses Arrow-backed strings by default, which breaks anndata's zarr writer. Apply the same fix in two code paths: - embedding_writer.py: replace select_dtypes("string") with per-column isinstance checks for pd.StringDtype and Arrow-backed Categoricals - zarr_utils.py: convert ArrowStringArray columns and index to object dtype before calling append_to_anndata_zarr Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- PHATE: default n_jobs from -1 (all cores) to 1 to prevent hogging shared SLURM nodes; exposed in PHATEConfig and compute_phate() - Annotation: support (fov_name, t, track_id) join as fallback when both sides lack an 'id' column; normalize fov_name by stripping leading/trailing slashes to prevent join mismatches Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

For multiclass problems, compute one-vs-rest AUROC per class and report as val_{class_name}_auroc columns in the results DataFrame. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- viscy-utils: add onnx, onnxscript to core deps; copairs to eval extras - dynaclr: add tracking optional group (gurobipy, onnxruntime-gpu, py-ctcmetrics, tabulate, tracksdata) for CTC tracking benchmark - Regenerate uv.lock Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- index.py: replace O(N*tau) Python loop in _compute_valid_anchors with vectorized pd.MultiIndex.isin(); add fit=False predict-mode fast path that skips anchor computation; add precomputed_valid_anchors to clone_with_subset() to avoid redundant recomputation; accept cell_index_df to avoid double-reading parquet - dataset.py: replace per-row loops in _build_match_lookup with groupby().indices; skip lookup build in predict mode; add organelle, well, microscope to exported metadata columns - datamodule.py: tune defaults (num_workers=4, cache_pool=500MB, pin_memory=True, buffer_size=4); use vectorized MultiIndex.isin for FOV split; reuse pre-loaded cell_index_df from ExperimentRegistry - experiment.py: from_cell_index returns (registry, dataframe) tuple so callers can reuse the DataFrame without re-reading from disk Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use .get() with None default for transcriptome_anndata and skip the barcode join when it is absent, allowing embeddings on datasets that lack paired scRNA-seq. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Centralize cell_index_path to shared /hpc/projects/.../collections/ dir across all training configs - MIP model: extend z_extraction_window 11->20, z_focus_offset 0.5->0.3, yx_patch_size 192->256, add BatchedRandSpatialCropd for Z-invariance - 3D BoC: num_workers 2->4; SLURM time limit 2d->4d - Collection: mark DynaCLR-2D-BagOfChannels-v3 as [LEGACY]; fix well assignments in BoC-lc-evaluation-v1 (add A/1 for 07_24, remove incorrect B/1 and B/2 from 01_28) - Add new collections: annotated MIP subset, test subset, alfi-eval (ALFI mitosis, 3 cell lines), microglia-eval (5 perturbations), benchmark_2exp (dataloader profiling) - predict.yml: add TQDMProgressBar callback (refresh_rate=10) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- evaluate.py: remove all SLURM script generation (_generate_*_sh, _slurm_header, _run_local*); replace with prepare_configs() that generates YAML configs and prints a JSON manifest to stdout; rename CLI command evaluate -> prepare-eval-configs; add MMD config generators - evaluate_config.py: remove SlurmConfig; add MMDStepConfig and ComparisonSpec imports; split PlotStepConfig.color_by into per-exp and combined_color_by; update TaskSpec.marker_filters docstring for auto-expand behaviour - cli.py: add prepare-eval-configs, check-evals, append-annotations, append-predictions, split-embeddings, compute-mmd, plot-mmd-heatmap, evaluate-tracking-accuracy commands - split_embeddings.py: new CLI to split combined embeddings.zarr by experiment, replacing inline SLURM script logic - check_evals.py: new CLI to print eval completion status from registry - eval_registry.yaml: declarative registry of models to evaluate - Delete 4 stale SLURM-era eval configs (SlurmConfig schema removed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three modes for measuring embedding-space distribution shifts: - Per-experiment (explicit comparison pairs, faceted by marker) - Combined (pairwise cross-experiment with batch centering) - Pooled (concatenates all experiments, BH FDR correction) Core implementation: - viscy_utils/evaluation/mmd.py: kernel MMD with median heuristic, Gaussian RBF kernel, unbiased estimator, and vectorized permutation test (avoids Python loops via binary label matrix multiplication) - viscy_utils/evaluation/embedding_map.py: mAP via copairs for phenotypic profiling (optional dependency) - evaluation/mmd/config.py: Pydantic config hierarchy for all three modes; temporal binning, shared bandwidth, balance_samples - evaluation/mmd/compute_mmd.py: orchestrates the three analysis modes; computes activity_zscore = (mmd2 - null_mean) / null_std for cross-marker comparability; outputs per-marker CSV files - evaluation/mmd/plotting.py: kinetics lines, heatmaps, activity z-score heatmaps, combined cross-experiment heatmaps, multi-panel grids, paired heatmaps with shared colorbar - configs/evaluation/recipes/mmd_defaults.yml: shared algorithm defaults (1000 permutations, max 2000 cells, seed 42) for YAML inheritance - tests/test_mmd.py: unit tests for MMD implementation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ver-time - orchestrated.py: when marker_filters is None, auto-discover all unique obs["marker"] values and run one classifier per marker; save trained pipelines as {task}_{marker}.joblib with manifest.json; add _plot_f1_over_time for per-class F1 at each timepoint; output one {task}_summary.pdf per task (was a single merged PDF) - orchestrated_test.py: update fixtures to expect 2 rows per task with auto-expansion; add test for sparse-marker skipping and F1-over-time plot generation - append_annotations.py: new CLI to persist ground-truth annotation columns directly into per-experiment zarr obs - append_predictions.py: new CLI to apply saved classifier pipelines to all cells in per-experiment zarrs, writing predicted_{task} to obs and predicted_{task}_proba to obsm Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When group_by is set (default "marker"), evaluate_smoothness iterates over unique group values, computes smoothness per group, saves per-group CSV, generates per-group plots, then aggregates via mean/std. Output filenames now include experiment_name for disambiguation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Evaluates whether DynaCLR embeddings improve cell tracking on Cell Tracking Challenge datasets vs an IoU baseline. - tracking_accuracy/config.py: Pydantic models for ONNX model entries, CTC dataset entries, ILP solver weights, and full benchmark config - tracking_accuracy/utils.py: seg_dir layout helper, pad_to_shape, normalize_crop (z-score using whole-frame statistics) - tracking_accuracy/evaluate_tracking.py: main benchmark driver - ctc_tracking_2d_mip_boc.yaml: DynaCLR-2D-MIP vs IoU on DIC-C2DL-HeLa - ctc_tracking_2d_mip_boc_all.yaml: all CTC sequences variant - export_onnx_2d_mip_boc.yml: config for exporting the MIP model to ONNX Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Pairplot: change diag_kind kde -> hist; rasterize scatter points to prevent PDF bloat; improve legend (alpha=1.0, larger marker sizes) - Scatter 2D: improve legend (markerscale=6, fontsize=10, framealpha=1.0, edgecolor="black") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

os.cpu_count() returns the node's physical core count, not the SLURM-allocated count. On a 48-core node where SLURM gave us 16, ad-hoc users of os.cpu_count() oversubscribe. Centralize the SLURM_CPUS_PER_TASK fallback in viscy_utils.mp_utils.available_cpus and route MultiExperimentDataModule's tensorstore concurrency through it. Pin BLAS to 1 thread per process in REDUCE_COMBINED — PHATE's joblib n_jobs spawns one worker per allocated CPU, so unbounded BLAS would yield ~cores^2 threads. Standard sklearn parallelism pattern (one axis at a time). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PHATE's internal PCA pre-reduction (graphtools -> sklearn -> scipy.linalg.lu) deadlocks silently on scipy 1.17.1 + sklearn 1.8.0 — process sits at ~0% CPU forever. Wire X_pca_combined back into PHATE so it skips its own pre-PCA: when phate.n_pca is null, fit on the already-reduced PCA output instead of raw .X. Add caller-owned fit-set indexing (fit_idx) to viscy_utils.evaluation.compute_phate so the orchestrator can draw a per-store lineage cap. Whole lineages are sampled per store (cap=N each); PHATE fits on the union and transforms the full input. Re-enables PHATE in the eval recipe with a 100-cell per-store cap for fast iteration; bump for paper figures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cross-modal InfoNCE head pulls image features toward a paired per-cell vector target (e.g. transcriptomic embedding). Image and target sides each pass through a small projector into a shared space; samples whose target contains NaN (unpaired cells) are masked out so the head runs on partially-paired batches. Extend ContrastiveModule._get_labels to handle vector-valued metadata: list/tuple/array entries are stacked into (B, D) float tensors, scalars stay as (B,) long tensors. Required for the new head's paired-target lookup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CELL-DINO is a DINOv2-architecture ViT pretrained on fluorescence microscopy (Human Protein Atlas). The channel_adaptive_dino_vitl16 checkpoint processes one channel at a time through a single-channel ViT-L/16 stem; the wrapper reshapes (B, C, H, W) -> (B*C, 1, H, W), runs the backbone, and mean-pools the cls token across channels for a fixed-dim embedding regardless of input channel count. Weights load from a local .pth state_dict — nothing fetched from the network. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PCA pairplot rendering is per-coloring-variable independent; fan out across colorings using joblib loky workers (one worker per coloring, capped by available_cpus). Workers re-import matplotlib + seaborn (~1s overhead) so the gain only kicks in for pairplot_components >= 4 on >100k cells, which matches the paper-figure config. Add the pairplot_components knob to the infectomics recipe at 4 (PC1..PC4 grid = 16 panels per coloring); bump higher for final paper figures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The fixed coupling between PLOT and PLOT_COMBINED forced every infectomics run to fan out per-experiment scatter even when only the combined figure was needed. Make both stages independently togglable via steps:; the Nextflow DAG already checks `steps`, just had hardcoded behaviour assuming both always run. Switch infectomics-annotated to plot_combined only — the per-experiment scatter doesn't carry into paper figures. Drop the redundant marker_filters on cell_death_state (applies to all markers; the filter was leftover from when LC was only run on G3BP1/SEC61B sensors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Job 31449149 OOM'd in cgroup on rank 3 host RAM (not VRAM) on the 384² single-marker variant. Loader prefetch buffers scale with workers × prefetch_factor, not batch_size. - Drop prefetch_factor 2→1 in the BoC base config — halves in-flight batches per worker, restores earlier behaviour. - Drop the 192 sbatch from 4→2 GPUs and bump --mem-per-cpu 14G→17G (255 GB/rank, 510 GB/node) so each rank has more headroom; also eases queue priority. Pin trainer.devices=2 in the override yml so the Lightning config matches. Batch size kept at 256/rank — host RAM was the OOM driver. If this still OOMs, suspect a real leak (loky semaphores, tensorstore decoder scratch) rather than papering over with more RAM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same physical microglia cells appeared three times in the collection (BF, Phase3D, Retardance), tripling the experiment's row count and biasing marker/experiment sampling without adding biological signal. Keep Phase3D only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add status legend (✅ landed / 🔄 running / ⬜ pending) and inline notes per model so the registry reads as a state-of-the bake-off. Stable name strings ensure the model→color palette matches across infectomics-annotated, alfi, and microglia registries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the primary_analysis.csv + cage_crop parsing path with a direct read from the cells AnnData zarr (dinov2.zarr / rna.zarr under a shared anndata_dir). The fov_name column is the zarr path; load_cells_anndata returns it as zarr_path so the rest of the pipeline is unchanged. Split CLI: data_paths.yml carries the shared zarr_store + anndata_dir + output_dir, and embed_<model>.yml carries per-model config (channels, output_key, target_pixel_size, batch_size). Both files are merged at startup. Add a max_cells smoke-test knob that truncates the cell table post-filter for fast iteration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit 307371c.

- viscy-utils: pin anndata<0.12.9 across all/anndata/dev/test extras (matches pyproject; the constraint was added but the lock hadn't been refreshed) - viscy: normalize gurobipy specifier to the same range - nvidia-* and cuda-bindings: add platform_machine != 's390x' markers per uv solver auto-update Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PCA-RGB timelapse MP4 export needs imageio's FFmpeg plugin; without it the timelapse CLI silently falls back to GIF. Bundle matplotlib so the visualization helpers don't pull it through a transitive eval-extra dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

EmbeddingWriter's primary array (adata.X) is hard-coded to the encoder backbone "features". DINOv3-temporal-MLP and similar frozen-backbone-with-trained-head models put all the learned task signal in the projection head — predicting features in that case discards the only learned component. Add a predict.embedding_key knob ("features" | "projections") that the eval orchestrator threads into the generated predict YAML. The unselected array remains as a sidecar in obsm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The MLP head is the only finetuned component — the DINOv3 backbone is frozen during training. Defaulting to features would make this row a duplicate of DINOv3-frozen and discard the only learned task signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Override on top of DynaCLR-2D-MIP-BagOfChannels.yml that flips positive sampling to fully self-supervised: anchor and positive are the same crop pre-augmentation, view diversity comes from the augmentation pipeline. Same v3 parquet, same single-marker batches, same marker-uniform group_weights as the temporal -single-marker variant — only the positive-sampling strategy differs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

4-GPU h100|h200 DDP launcher mirroring the temporal -single-marker.sh. RUN_NAME tagged with 'classical' for clean WandB separation; no warm-start checkpoint (fresh init since the classical and temporal variants are independent training runs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Verifies the classical override against the real v3 parquet: 1. Self-mode short-circuits index lookup (no _lineage_timepoints, no _match_lookup built). 2. positive == anchor.clone() pre-augmentation, with anchor_meta and positive_meta matching 1:1. 3. Augmentation pipeline runs end-to-end on both keys independently (shape 16x256x256 -> 1x160x160, NormalizeSampled centers to ~0, views diverge with mean |a-p| ~= 0.75 * anchor_std). 4. batch_group_by: marker enforces single-marker batches; over 30 batches the sampler rotates through all 9 configured markers with marker-uniform weighting working as designed. 5. stratify_by: experiment mixes ~3 experiments per single-marker batch, preventing dynamorph domination of Phase3D batches. Run: uv run python applications/dynaclr/scripts/dataloader_inspection/test_classical_self_positives.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sbatch copies the script into the spool dir, so dirname "$0" resolves to /var/spool/slurm/job<id>/ — the relative ../slurm/train.sh path then doesn't exist and the job fails with exit 1 in <1s. Switch to the absolute repo path (matches the working -single-marker-192.sh launcher). Job 32371268 hit this; resubmitting after fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same fix as the classical launcher (9474a5f): sbatch copies the script into /var/spool/slurm/job<id>/ so dirname "$0" doesn't resolve to the repo. Job 32535968 hit this when resuming from epoch 52. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the DINOv3 leaf with the channel-adaptive ViT-L/16 backbone (1024-dim cls token after channel mean-pool). Uses the v3 parquet collection. SLURM launcher uses absolute path to train.sh.

Mirrors the dynacell-models pattern. recipes/trainer/fit.yml holds logger + callbacks + log cadence; recipes/topology/{single_gpu, ddp_2gpu,ddp_4gpu,ddp_8gpu}.yml hold accelerator/strategy/devices. Leaves migrate from re-declaring strategy/devices inline to composing both fragments via base:.

Each leaf now composes recipes/trainer/fit.yml + a recipes/topology/ fragment instead of recipes/trainer.yml plus inline strategy/devices. Composed configs are byte-identical before and after (verified by snapshot diff across all 10 leaves).

All leaves now compose recipes/trainer/fit.yml + recipes/topology/ fragments. Old monolithic recipe is unreferenced. Docs updated to show the orthogonal-axis layout.

Lightning subcommands are now handled by viscy_utils.cli.main, which performs base: recipe composition before LightningCLI parses the resolved config. Click subcommands (eval tooling) are unchanged. Mirrors the dynacell __main__.py routing pattern (without the Hydra branch — dynaclr eval tooling stays Click-based).

The new dynaclr fit entry point composes base: recipes via viscy_utils.cli.main and is the place future resolver hooks will hang.

limit_train_batches: 800 / limit_val_batches: 200. The prior unbounded run (job 32536880) was host-OOM-killed at 2h elapsed; the leak is in training-step logging accumulating MetaTensors and the cap also provides regular val checkpoints.

CELL-DINO ViT-L/16 upsamples 160² patches to 224² before forward, which doubled per-batch host RAM vs. ConvNeXt-tiny at the same batch_size. Combined with random plate sampling (zero cache hit rate, each tensorstore.Context costs ~500 MB) the BoC parquet pegged the 240 GB cgroup ceiling within an hour at bs=512/nw=2 (jobs 32536880 and 32568006). Match the DynaCLR-2D-MIP-BagOfChannels tunings: - bs=256/nw=4 (smaller batches, more workers each holding less queue) - prefetch_factor=1, buffer_size=1 - cache_pool_bytes=0 - file_io_concurrency=32

The orchestrated LC trainer used sklearn.train_test_split at the cell level, which puts cells from the same track in both train and val. For temporal-contrastive SSL embeddings (DynaCLR) this inflates val AUROC by 1–20 points per channel because the SSL was trained to pull same-track embeddings together. split_groups_by takes a list of obs columns (default [experiment, fov_name, track_id]); when set, the splitter switches to GroupShuffleSplit so no group lands in both halves. Default is now baked into the infectomics LC recipe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

compare_evals.py gains _load_infection_kinetics + _plot_infection_kinetics: percent infected over time per model, with a single shared annotation curve (canonical bundle picked by lowest pairwise disagreement) and one solid colored curve per channel restricted to annotated cells for 1:1 amplitude comparison. Outputs infection_kinetics.{pdf,csv}. Adds time_window: [lo, hi] to the registry YAML; when set, the kinetics plot clips before binning so the cohort composition stays constant across time bins (ZIKV cohort ends ~25h, DENV runs to 66h — without clipping, late bins are dominated by the long DENV runs). Adds palette_anchor for stable model→color mapping across registries. eval_registry_infectomics_grouped.yaml is the companion registry that reads from each model's <eval_dir>_grouped/ sibling and uses time_window: [4.0, 25.0]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

edyoshikun and others added 30 commits March 31, 2026 13:43

utility to combine multiple ann datasets and compute dim reduction me…

c54f568

…thods (i.e phate, pca,umap)

batch z transform for 2D MIP

497bcfa

dynaclr info: handle sparse X matrices

b50e81b

np.nanmin/nanmax fail on scipy sparse arrays. Convert to dense before computing range stats so the command works on Seurat-exported anndata zarr stores. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adding files for training

474b66d

spurious slash in the file

55d2004

-multiexperiment prediction

8bea25b

- CLI for running evals - DAG for evals - yaml files for evals

Merge branch 'modular-viscy-staging' into dynadtw

2f3c1bc

Fix channel_utils regex for PhC and BF label-free detection

d6d3614

- Fix \bbf[\b_] -> \bbf(\b|_): inside a character class, \b is a backspace character, not a word boundary - Add \bphc\b to detect phase-contrast (PhC) as label-free Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add per-class AUROC to linear classifier metrics

c796a4d

For multiclass problems, compute one-vs-rest AUROC per class and report as val_{class_name}_auroc columns in the results DataFrame. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Make cellanome embedding scripts work without transcriptome data

84a9140

Use .get() with None default for transcriptome_anndata and skip the barcode join when it is absent, allowing embeddings on datasets that lack paired scRNA-seq. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

edyoshikun and others added 30 commits May 5, 2026 14:36

Revert "Cellanome embed scripts: split data_paths from per-model config"

2fe7124

This reverts commit 307371c.

Add CELL-DINO temporal-MLP 2D BagOfChannels training config

115e072

Mirrors the DINOv3 leaf with the channel-adaptive ViT-L/16 backbone (1024-dim cls token after channel mean-pool). Uses the v3 parquet collection. SLURM launcher uses absolute path to train.sh.

Remove obsolete recipes/trainer.yml and update docs

f02e395

All leaves now compose recipes/trainer/fit.yml + recipes/topology/ fragments. Old monolithic recipe is unreferenced. Docs updated to show the orthogonal-axis layout.

Flip slurm/train.sh to call dynaclr fit instead of viscy fit

02f0625

The new dynaclr fit entry point composes base: recipes via viscy_utils.cli.main and is the place future resolver hooks will hang.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DTW DynaCLR Monorepo#398

DTW DynaCLR Monorepo#398
edyoshikun wants to merge 161 commits into
modular-viscy-stagingfrom
dynadtw

edyoshikun commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edyoshikun commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants