Fix: Optimize LensingDataset data loading and prevent NaN loss by kamilansri · Pull Request #167 · ML4SCI/DeepLense

kamilansri · 2026-03-03T11:10:19Z

Description

This PR addresses a critical mathematical flaw in the LensingDataset normalization step that could lead to division-by-zero errors (resulting in NaN loss during training). It also refactors the data loading logic to be more robust across different operating systems and significantly more memory-efficient.

Changes Made

Fixed NaN risk in normalization: Added a small epsilon value (1e-8) to the denominator during min-max normalization. This prevents division-by-zero crashes if an image array is completely flat/blank (where min equals max).
Robust path construction: Replaced raw string concatenation (directory+selected_class) with os.path.join to prevent malformed paths if the directory string is missing a trailing slash.
Optimized tensor creation: Removed the redundant np.array([np.load(...)]) wrapper. Now using torch.from_numpy() to convert the array directly into a tensor without unnecessary memory copying.
Added dimension and type safety: Appended .unsqueeze(0) to correctly add the channel dimension, and chained .float() to ensure the output tensor is float32 (saving memory and compute compared to float64).

fix: optimize LensingDataset loading and prevent division-by-zero

7a2af96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Optimize LensingDataset data loading and prevent NaN loss#167

Fix: Optimize LensingDataset data loading and prevent NaN loss#167
kamilansri wants to merge 1 commit intoML4SCI:mainfrom
kamilansri:fix/lensing-dataset-optimization

kamilansri commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kamilansri commented Mar 3, 2026

Description

Changes Made

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant