When loading image files, the file is moved to the GPU before doing preprocessing such as resizing and cropping. This can result in an out of memory CUDA error if the image is large enough. Preprocessing should be done on the CPU and the model only moved to GPU when needed to run the NN.
File "C:\Users\rossm\Source\Repos\OneTrainer\modules\dataLoader\mixin\DataLoaderMgdsMixin.py", line 25, in _create_mgds
ds = MGDS(
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 357, in __init__
self.loading_pipeline.start()
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 302, in start
module.start_next_epoch()
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 1185, in start_next_epoch
item[name] = self.get_previous_item(name, index)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
item = module.get_item(index, item_name)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 770, in get_item
previous_item = self.get_previous_item(name, index)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
item = module.get_item(index, item_name)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 481, in get_item
image = resize(image)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\transforms.py", line 361, in forward
return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\functional.py", line 492, in resize
return F_t.resize(img, size=output_size, interpolation=interpolation.value, antialias=antialias)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 462, in resize
img, need_cast, need_squeeze, out_dtype = _cast_squeeze_in(img, [torch.float32, torch.float64])
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 528, in _cast_squeeze_in
img = img.to(req_dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 568.00 MiB. GPU 0 has a total capacty of 16.00 GiB of which 0 bytes is free. Of the allocated memory 13.50 GiB is allocated by PyTorch, and 1.46 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
When loading image files, the file is moved to the GPU before doing preprocessing such as resizing and cropping. This can result in an out of memory CUDA error if the image is large enough. Preprocessing should be done on the CPU and the model only moved to GPU when needed to run the NN.
Sample stack trace: