What happened?
This error happens if the base model is a vpred model.
Steps to reproduce:
- Train LoRA on a vpred model.
- Stop training.
- Restart OneTrainer.
- Resume training from backup.
- Wait for sampling.
What did you expect would happen?
Train without NaN error.
Relevant log output
Continuing training from backup 'C:/Users/yamat/Documents/OneTrainer/vpred\backup\2026-03-24_19-05-55-backup-270-8-6'...
Fetching 17 files: 100%|███████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 245028.07it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:08<00:00, 1.16s/it]
Selected layers: 722
Deselected layers: 72
Note: Enable Debug mode to see the full list of layer names
enumerating sample paths: 100%|██████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 23.07it/s]
enumerating sample paths: 0%| | 0/1 [00:00<?, ?it/s]W0324 20:56:29.563000 14008 venv\Lib\site-packages\torch\_inductor\utils.py:1613] [0/0] Not enough SMs to use max_autotune_gemm mode
step: 100%|█████████████████████████████████████████████| 33/33 [04:05<00:00, 9.08s/it, loss=0.139, smooth loss=0.155]
caching: 100%|███████████████████████████████████████████████████████████████████████| 145/145 [00:07<00:00, 18.35it/s]
sampling: 100%|████████████████████████████████████████████████████████████████████████| 30/30 [00:25<00:00, 1.19it/s]
Creating Backup C:/Users/yamat/Documents/OneTrainer/vpred\backup\2026-03-24_21-01-29-backup-298-9-1
step: 12%|█████▌ | 4/33 [01:24<10:10, 21.06s/it, loss=0.186, smooth loss=0.157]
epoch: 8%|██████ | 1/12 [05:45<1:03:23, 345.78s/it]
Traceback (most recent call last):
File "C:\Users\yamat\Desktop\OneTrainer\modules\ui\TrainUI.py", line 719, in __training_thread_function
trainer.train()
~~~~~~~~~~~~~^^
File "C:\Users\yamat\Desktop\OneTrainer\modules\trainer\GenericTrainer.py", line 796, in train
raise RuntimeError("Training loss became NaN. This may be due to invalid parameters, precision issues, or a bug in the loss computation.")
RuntimeError: Training loss became NaN. This may be due to invalid parameters, precision issues, or a bug in the loss computation.
Creating Backup C:/Users/yamat/Documents/OneTrainer/vpred\backup\2026-03-24_21-02-08-backup-301-9-4
Saving C:/Users/yamat/Documents/OneTrainer/vpred/vpred.safetensors
Generate and upload debug_report.log
=== System Information ===
OS: Windows 11
Version: 10.0.26200
=== Hardware Information ===
CPU: 12th Gen Intel(R) Core(TM) i7-12700F (Cores: 12)
Total RAM: 15.76 GB
=== GPU Information ===
NVIDIA GPU (Index 0): NVIDIA GeForce RTX 3060 [NVIDIA]
Driver version: 595.79
Power Limit: 170.00 W
=== Python Environment ===
Global Python Version: 3.13.12
Python Executable Path: C:\Users\anonymous\Desktop\OneTrainer\venv\Scripts\python.exe
PyTorch Info: torch==2.9.1+cu128
pip freeze output:
absl-py==2.4.0
accelerate==1.12.0
adv_optm==2.2.3
aiodns==4.0.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiohttp-retry==2.9.1
aiosignal==1.4.0
annotated-doc==0.0.4
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.12.1
attrs==26.1.0
av==16.1.0
backoff==2.2.1
backports.zstd==1.3.0
bcrypt==5.0.0
bitsandbytes==0.49.1
boto3==1.42.72
botocore==1.42.72
brotli==1.2.0
certifi==2026.2.25
cffi==2.0.0
charset-normalizer==3.4.6
click==8.2.1
cloudpickle==3.1.2
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.3.3
cryptography==45.0.7
customtkinter==5.2.2
cycler==0.12.1
dadaptation==3.2
darkdetect==0.8.0
decorator==5.2.1
deepdiff==8.6.1
Deprecated==1.3.1
-e git+https://github.com/huggingface/diffusers.git@99daaa802da01ef4cff5141f4f3c0329a57fb591#egg=diffusers
dnspython==2.8.0
email-validator==2.3.0
fabric==3.2.2
fastapi==0.135.1
fastapi-cli==0.0.24
fastapi-cloud-cli==0.15.0
fastar==0.8.0
filelock==3.25.2
flatbuffers==25.12.19
fonttools==4.62.1
frozenlist==1.8.0
fsspec==2026.2.0
ftfy==6.3.1
gguf==0.17.1
grpcio==1.78.1
h11==0.16.0
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
huggingface-hub==0.34.4
humanfriendly==10.0
idna==3.11
imagesize==1.4.1
importlib_metadata==9.0.0
inquirerpy==0.3.4
invisible-watermark==0.2.0
invoke==2.2.1
itsdangerous==2.2.0
Jinja2==3.1.6
jmespath==1.1.0
kiwisolver==1.5.0
lightning-utilities==0.15.3
lion-pytorch==0.2.3
Markdown==3.10.2
markdown-it-py==4.0.0
MarkupSafe==3.0.3
matplotlib==3.10.3
mdurl==0.1.2
-e git+https://github.com/Nerogar/mgds.git@a25b59f7619da99fdc6f8e8d5a0d89be519a4671#egg=mgds
mpmath==1.3.0
multidict==6.7.1
-e git+https://github.com/KellerJordan/Muon.git@f90a42b28e00b8d9d2d05865fe90d9f39abcbcbd#egg=muon_optimizer
networkx==3.6.1
numpy==2.2.6
nvidia-ml-py==13.595.45
omegaconf==2.3.0
onnxruntime-gpu==1.23.2
open_clip_torch==2.32.0
opencv-python==4.11.0.86
orderly-set==5.5.0
orjson==3.11.7
packaging==26.0
paramiko==4.0.0
parse==1.20.2
pfzy==0.3.4
pillow==12.1.1
platformdirs==4.9.4
pooch==1.8.2
prettytable==3.17.0
prodigy-plus-schedule-free==2.0.1
prodigyopt==1.1.2
prompt_toolkit==3.0.52
propcache==0.4.1
protobuf==7.34.0
psutil==7.0.0
py-cpuinfo==9.0.0
pycares==5.0.1
pycparser==3.0
pydantic==2.12.5
pydantic-extra-types==2.11.1
pydantic-settings==2.13.1
pydantic_core==2.41.5
Pygments==2.19.2
PyNaCl==1.6.2
pyparsing==3.3.2
pyreadline3==3.5.4
python-dateutil==2.9.0.post0
python-dotenv==1.2.2
python-multipart==0.0.22
pytorch-lightning==2.6.1
pytorch_optimizer==3.6.0
PyWavelets==1.9.0
PyYAML==6.0.2
regex==2026.2.28
requests==2.32.5
rich==14.3.3
rich-toolkit==0.19.7
rignore==0.7.6
runpod==1.7.10
s3transfer==0.16.0
safetensors==0.7.0
scalene==1.5.51
scenedetect==0.6.7.1
schedulefree==1.4.1
scipy==1.15.3
sentencepiece==0.2.1
sentry-sdk==2.55.0
setuptools==81.0.0
shellingham==1.5.4
six==1.17.0
starlette==0.52.1
sympy==1.14.0
tensorboard==2.20.0
tensorboard-data-server==0.7.2
timm==1.0.25
tokenizers==0.22.2
tomli==2.4.0
tomlkit==0.14.0
torch==2.9.1+cu128
torchmetrics==1.9.0
torchvision==0.24.1+cu128
tqdm==4.67.1
tqdm-loggable==0.4.1
transformers==4.57.6
triton-windows==3.5.1.post24
typer==0.24.1
typing-inspection==0.4.2
typing_extensions==4.15.0
ujson==5.11.0
urllib3==2.6.3
uvicorn==0.42.0
watchdog==6.0.0
watchfiles==1.1.1
wcwidth==0.6.0
websockets==16.0
Werkzeug==3.1.6
wheel==0.46.3
wrapt==2.1.2
yarl==1.23.0
yt-dlp==2026.3.17
zipp==3.23.0
=== Git Information ===
Repo: Nerogar/OneTrainer
Branch: master
Commit: cb6cab2
No deleted, unmerged, or modified files relative to origin/master.
=== Network Connectivity ===
PyPI (https://pypi.org/): Failure: expected string or bytes-like object, got 'NoneType'
HuggingFace (https://huggingface.co): Failure: expected string or bytes-like object, got 'NoneType'
Google (https://www.google.com): Failure: expected string or bytes-like object, got 'NoneType'
=== Intel Microcode Information ===
CPU is not detected as 13th or 14th Gen Intel - microcode info not applicable.
What happened?
This error happens if the base model is a vpred model.
Steps to reproduce:
What did you expect would happen?
Train without NaN error.
Relevant log output
Generate and upload debug_report.log
=== System Information ===
OS: Windows 11
Version: 10.0.26200
=== Hardware Information ===
CPU: 12th Gen Intel(R) Core(TM) i7-12700F (Cores: 12)
Total RAM: 15.76 GB
=== GPU Information ===
NVIDIA GPU (Index 0): NVIDIA GeForce RTX 3060 [NVIDIA]
Driver version: 595.79
Power Limit: 170.00 W
=== Python Environment ===
Global Python Version: 3.13.12
Python Executable Path: C:\Users\anonymous\Desktop\OneTrainer\venv\Scripts\python.exe
PyTorch Info: torch==2.9.1+cu128
pip freeze output:
absl-py==2.4.0
accelerate==1.12.0
adv_optm==2.2.3
aiodns==4.0.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aiohttp-retry==2.9.1
aiosignal==1.4.0
annotated-doc==0.0.4
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.12.1
attrs==26.1.0
av==16.1.0
backoff==2.2.1
backports.zstd==1.3.0
bcrypt==5.0.0
bitsandbytes==0.49.1
boto3==1.42.72
botocore==1.42.72
brotli==1.2.0
certifi==2026.2.25
cffi==2.0.0
charset-normalizer==3.4.6
click==8.2.1
cloudpickle==3.1.2
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.3.3
cryptography==45.0.7
customtkinter==5.2.2
cycler==0.12.1
dadaptation==3.2
darkdetect==0.8.0
decorator==5.2.1
deepdiff==8.6.1
Deprecated==1.3.1
-e git+https://github.com/huggingface/diffusers.git@99daaa802da01ef4cff5141f4f3c0329a57fb591#egg=diffusers
dnspython==2.8.0
email-validator==2.3.0
fabric==3.2.2
fastapi==0.135.1
fastapi-cli==0.0.24
fastapi-cloud-cli==0.15.0
fastar==0.8.0
filelock==3.25.2
flatbuffers==25.12.19
fonttools==4.62.1
frozenlist==1.8.0
fsspec==2026.2.0
ftfy==6.3.1
gguf==0.17.1
grpcio==1.78.1
h11==0.16.0
httpcore==1.0.9
httptools==0.7.1
httpx==0.28.1
huggingface-hub==0.34.4
humanfriendly==10.0
idna==3.11
imagesize==1.4.1
importlib_metadata==9.0.0
inquirerpy==0.3.4
invisible-watermark==0.2.0
invoke==2.2.1
itsdangerous==2.2.0
Jinja2==3.1.6
jmespath==1.1.0
kiwisolver==1.5.0
lightning-utilities==0.15.3
lion-pytorch==0.2.3
Markdown==3.10.2
markdown-it-py==4.0.0
MarkupSafe==3.0.3
matplotlib==3.10.3
mdurl==0.1.2
-e git+https://github.com/Nerogar/mgds.git@a25b59f7619da99fdc6f8e8d5a0d89be519a4671#egg=mgds
mpmath==1.3.0
multidict==6.7.1
-e git+https://github.com/KellerJordan/Muon.git@f90a42b28e00b8d9d2d05865fe90d9f39abcbcbd#egg=muon_optimizer
networkx==3.6.1
numpy==2.2.6
nvidia-ml-py==13.595.45
omegaconf==2.3.0
onnxruntime-gpu==1.23.2
open_clip_torch==2.32.0
opencv-python==4.11.0.86
orderly-set==5.5.0
orjson==3.11.7
packaging==26.0
paramiko==4.0.0
parse==1.20.2
pfzy==0.3.4
pillow==12.1.1
platformdirs==4.9.4
pooch==1.8.2
prettytable==3.17.0
prodigy-plus-schedule-free==2.0.1
prodigyopt==1.1.2
prompt_toolkit==3.0.52
propcache==0.4.1
protobuf==7.34.0
psutil==7.0.0
py-cpuinfo==9.0.0
pycares==5.0.1
pycparser==3.0
pydantic==2.12.5
pydantic-extra-types==2.11.1
pydantic-settings==2.13.1
pydantic_core==2.41.5
Pygments==2.19.2
PyNaCl==1.6.2
pyparsing==3.3.2
pyreadline3==3.5.4
python-dateutil==2.9.0.post0
python-dotenv==1.2.2
python-multipart==0.0.22
pytorch-lightning==2.6.1
pytorch_optimizer==3.6.0
PyWavelets==1.9.0
PyYAML==6.0.2
regex==2026.2.28
requests==2.32.5
rich==14.3.3
rich-toolkit==0.19.7
rignore==0.7.6
runpod==1.7.10
s3transfer==0.16.0
safetensors==0.7.0
scalene==1.5.51
scenedetect==0.6.7.1
schedulefree==1.4.1
scipy==1.15.3
sentencepiece==0.2.1
sentry-sdk==2.55.0
setuptools==81.0.0
shellingham==1.5.4
six==1.17.0
starlette==0.52.1
sympy==1.14.0
tensorboard==2.20.0
tensorboard-data-server==0.7.2
timm==1.0.25
tokenizers==0.22.2
tomli==2.4.0
tomlkit==0.14.0
torch==2.9.1+cu128
torchmetrics==1.9.0
torchvision==0.24.1+cu128
tqdm==4.67.1
tqdm-loggable==0.4.1
transformers==4.57.6
triton-windows==3.5.1.post24
typer==0.24.1
typing-inspection==0.4.2
typing_extensions==4.15.0
ujson==5.11.0
urllib3==2.6.3
uvicorn==0.42.0
watchdog==6.0.0
watchfiles==1.1.1
wcwidth==0.6.0
websockets==16.0
Werkzeug==3.1.6
wheel==0.46.3
wrapt==2.1.2
yarl==1.23.0
yt-dlp==2026.3.17
zipp==3.23.0
=== Git Information ===
Repo: Nerogar/OneTrainer
Branch: master
Commit: cb6cab2
No deleted, unmerged, or modified files relative to origin/master.
=== Network Connectivity ===
PyPI (https://pypi.org/): Failure: expected string or bytes-like object, got 'NoneType'
HuggingFace (https://huggingface.co): Failure: expected string or bytes-like object, got 'NoneType'
Google (https://www.google.com): Failure: expected string or bytes-like object, got 'NoneType'
=== Intel Microcode Information ===
CPU is not detected as 13th or 14th Gen Intel - microcode info not applicable.