Skip to content

Large images can cause OutOfMemoryError in dataloader #1

@RossM

Description

@RossM

When loading image files, the file is moved to the GPU before doing preprocessing such as resizing and cropping. This can result in an out of memory CUDA error if the image is large enough. Preprocessing should be done on the CPU and the model only moved to GPU when needed to run the NN.

Sample stack trace:

  File "C:\Users\rossm\Source\Repos\OneTrainer\modules\dataLoader\mixin\DataLoaderMgdsMixin.py", line 25, in _create_mgds
    ds = MGDS(
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 357, in __init__
    self.loading_pipeline.start()
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 302, in start
    module.start_next_epoch()
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 1185, in start_next_epoch
    item[name] = self.get_previous_item(name, index)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
    item = module.get_item(index, item_name)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 770, in get_item
    previous_item = self.get_previous_item(name, index)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
    item = module.get_item(index, item_name)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 481, in get_item
    image = resize(image)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\transforms.py", line 361, in forward
    return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\functional.py", line 492, in resize
    return F_t.resize(img, size=output_size, interpolation=interpolation.value, antialias=antialias)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 462, in resize
    img, need_cast, need_squeeze, out_dtype = _cast_squeeze_in(img, [torch.float32, torch.float64])
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 528, in _cast_squeeze_in
    img = img.to(req_dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 568.00 MiB. GPU 0 has a total capacty of 16.00 GiB of which 0 bytes is free. Of the allocated memory 13.50 GiB is allocated by PyTorch, and 1.46 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions