Skip to content

Fix: Optimize LensingDataset data loading and prevent NaN loss#167

Open
kamilansri wants to merge 1 commit intoML4SCI:mainfrom
kamilansri:fix/lensing-dataset-optimization
Open

Fix: Optimize LensingDataset data loading and prevent NaN loss#167
kamilansri wants to merge 1 commit intoML4SCI:mainfrom
kamilansri:fix/lensing-dataset-optimization

Conversation

@kamilansri
Copy link

Description

This PR addresses a critical mathematical flaw in the LensingDataset normalization step that could lead to division-by-zero errors (resulting in NaN loss during training). It also refactors the data loading logic to be more robust across different operating systems and significantly more memory-efficient.

Changes Made

  • Fixed NaN risk in normalization: Added a small epsilon value (1e-8) to the denominator during min-max normalization. This prevents division-by-zero crashes if an image array is completely flat/blank (where min equals max).
  • Robust path construction: Replaced raw string concatenation (directory+selected_class) with os.path.join to prevent malformed paths if the directory string is missing a trailing slash.
  • Optimized tensor creation: Removed the redundant np.array([np.load(...)]) wrapper. Now using torch.from_numpy() to convert the array directly into a tensor without unnecessary memory copying.
  • Added dimension and type safety: Appended .unsqueeze(0) to correctly add the channel dimension, and chained .float() to ensure the output tensor is float32 (saving memory and compute compared to float64).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant