Conversation
MichaelTj02
commented
Mar 16, 2026
- Perform check on dataset prior to training (resolution, consistency, format, colour channels)
- Add lazy loading to thumbnails section in preprocessing model improving performance on larger file imports
…d resolution consistency check
ucodia
left a comment
There was a problem hiding this comment.
I found multiple bugs in this PR.
I would suggest you split things related to dataset checks and thumbnail lazy loading separately for easier review and fixing.
| _, new_data_path = imgui_utils.input_text("##preprocessing_data_path", self.preprocessing_data_path, 1024, 0, | ||
| _, new_data_path = imgui_utils.input_text("##preprocessing_data_path", str(self.preprocessing_data_path), 1024, 0, | ||
| width=imgui.get_window_width() - self.menu.app.button_w - imgui.calc_text_size("Browse")[0]) | ||
| if new_data_path != self.preprocessing_data_path: |
There was a problem hiding this comment.
Bug: We are using != to compare a str with a Path, therefore this check will always be True. I advice you wrap Path objects in str in comparisons from input_text like you did on line 279.
There was a problem hiding this comment.
We need to review all types to make sure we are not duplicating this issue elsewhere and are consistent in the types we use
| try: | ||
| if Path(self.icon_path).exists(): | ||
| help_img = cv2.imread(Path(self.icon_path).as_posix(), cv2.IMREAD_UNCHANGED) | ||
| help_img = cv2.imread(Path(self.icon_path), cv2.IMREAD_UNCHANGED) |
There was a problem hiding this comment.
Potential bug: Does imread really take a Path object? str seemed safer.
| self.save_path = directory_path | ||
| else: | ||
| print("No save path selected") | ||
| self.save_path = self.save_path |
| target_dataset_path = Path(self.data_path) | ||
| if target_dataset_path.is_dir(): | ||
| image_files = [f for f in target_dataset_path.iterdir() | ||
| if f.is_file()] |
There was a problem hiding this comment.
Bug: We used to have a .png filter meaning we would not try to open non image files before, if a .txt exist in the dataset folder, I think PIL is gonna crash. Any reason for removing that filter?
| @@ -1,9 +1,11 @@ | |||
| from pathlib import Path | |||
| import zipfile | |||
| import io | |||
| for file_path in file_paths: | ||
| self.get_thumbnail(file_path) | ||
|
|
||
| self.file_index_map = {fp: idx for idx, fp in enumerate(file_paths)} |
There was a problem hiding this comment.
We create file_index_map in this function, then clear it in clear_thumbnails but never seem to use it anywhere. Either we use it or we remove it.
| raise ValueError( | ||
| "Invalid dataset:\n" | ||
| f"- Image '{filename}' is not square ({width}x{height}).\n" | ||
| "- StyleGAN3 training only accepts square images." |
There was a problem hiding this comment.
Just mention StyleGAN, version 3 is irrelevant to the error
| if height != expected_height or width != expected_width: | ||
| raise ValueError( | ||
| "Invalid dataset:\n" | ||
| f"- Image '{filename}' has resolution {width}x{height}, expected {expected_width}x{expected_height}.\n" |
There was a problem hiding this comment.
I would suggest the following error format instead for clarity:
Inconsistent dataset: 'foo.png' is 512x512 but the detected dataset resolution is 256x256. Ensure all images have been preprocessed to the same size.