[Bug]: NaN error when training vpred model with generalized offset noise enabled

### What happened?

This error happens if the base model is a vpred model.

Steps to reproduce:

1. Train LoRA on a vpred model.
2. Stop training.
3. Restart OneTrainer.
4. Resume training from backup.
5. Wait for sampling.

### What did you expect would happen?

Train without NaN error.

### Relevant log output

```shell
Continuing training from backup 'C:/Users/yamat/Documents/OneTrainer/vpred\backup\2026-03-24_19-05-55-backup-270-8-6'...
Fetching 17 files: 100%|███████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 245028.07it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:08<00:00,  1.16s/it]
Selected layers: 722
Deselected layers: 72
Note: Enable Debug mode to see the full list of layer names
enumerating sample paths: 100%|██████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 23.07it/s]
enumerating sample paths:   0%|                                                                  | 0/1 [00:00<?, ?it/s]W0324 20:56:29.563000 14008 venv\Lib\site-packages\torch\_inductor\utils.py:1613] [0/0] Not enough SMs to use max_autotune_gemm mode
step: 100%|█████████████████████████████████████████████| 33/33 [04:05<00:00,  9.08s/it, loss=0.139, smooth loss=0.155]
caching: 100%|███████████████████████████████████████████████████████████████████████| 145/145 [00:07<00:00, 18.35it/s]
sampling: 100%|████████████████████████████████████████████████████████████████████████| 30/30 [00:25<00:00,  1.19it/s]
Creating Backup C:/Users/yamat/Documents/OneTrainer/vpred\backup\2026-03-24_21-01-29-backup-298-9-1
step:  12%|█████▌                                        | 4/33 [01:24<10:10, 21.06s/it, loss=0.186, smooth loss=0.157]
epoch:   8%|██████                                                                   | 1/12 [05:45<1:03:23, 345.78s/it]
Traceback (most recent call last):
  File "C:\Users\yamat\Desktop\OneTrainer\modules\ui\TrainUI.py", line 719, in __training_thread_function
    trainer.train()
    ~~~~~~~~~~~~~^^
  File "C:\Users\yamat\Desktop\OneTrainer\modules\trainer\GenericTrainer.py", line 796, in train
    raise RuntimeError("Training loss became NaN. This may be due to invalid parameters, precision issues, or a bug in the loss computation.")
RuntimeError: Training loss became NaN. This may be due to invalid parameters, precision issues, or a bug in the loss computation.
Creating Backup C:/Users/yamat/Documents/OneTrainer/vpred\backup\2026-03-24_21-02-08-backup-301-9-4
Saving C:/Users/yamat/Documents/OneTrainer/vpred/vpred.safetensors
```

### Generate and upload debug_report.log

=== System Information ===
OS: Windows 11
Version: 10.0.26200

=== Hardware Information ===
CPU: 12th Gen Intel(R) Core(TM) i7-12700F (Cores: 12)
Total RAM: 15.76 GB

=== GPU Information ===
NVIDIA GPU (Index 0): NVIDIA GeForce RTX 3060 [NVIDIA]
    Driver version: 595.79
    Power Limit: 170.00 W

=== Python Environment ===
Global Python Version: 3.13.12
Python Executable Path: C:\Users\anonymous\Desktop\OneTrainer\venv\Scripts\python.exe
PyTorch Info: torch==2.9.1+cu128
pip freeze output:
    absl-py==2.4.0
    accelerate==1.12.0
    adv_optm==2.2.3
    aiodns==4.0.0
    aiohappyeyeballs==2.6.1
    aiohttp==3.13.3
    aiohttp-retry==2.9.1
    aiosignal==1.4.0
    annotated-doc==0.0.4
    annotated-types==0.7.0
    antlr4-python3-runtime==4.9.3
    anyio==4.12.1
    attrs==26.1.0
    av==16.1.0
    backoff==2.2.1
    backports.zstd==1.3.0
    bcrypt==5.0.0
    bitsandbytes==0.49.1
    boto3==1.42.72
    botocore==1.42.72
    brotli==1.2.0
    certifi==2026.2.25
    cffi==2.0.0
    charset-normalizer==3.4.6
    click==8.2.1
    cloudpickle==3.1.2
    colorama==0.4.6
    coloredlogs==15.0.1
    contourpy==1.3.3
    cryptography==45.0.7
    customtkinter==5.2.2
    cycler==0.12.1
    dadaptation==3.2
    darkdetect==0.8.0
    decorator==5.2.1
    deepdiff==8.6.1
    Deprecated==1.3.1
    -e git+https://github.com/huggingface/diffusers.git@99daaa802da01ef4cff5141f4f3c0329a57fb591#egg=diffusers
    dnspython==2.8.0
    email-validator==2.3.0
    fabric==3.2.2
    fastapi==0.135.1
    fastapi-cli==0.0.24
    fastapi-cloud-cli==0.15.0
    fastar==0.8.0
    filelock==3.25.2
    flatbuffers==25.12.19
    fonttools==4.62.1
    frozenlist==1.8.0
    fsspec==2026.2.0
    ftfy==6.3.1
    gguf==0.17.1
    grpcio==1.78.1
    h11==0.16.0
    httpcore==1.0.9
    httptools==0.7.1
    httpx==0.28.1
    huggingface-hub==0.34.4
    humanfriendly==10.0
    idna==3.11
    imagesize==1.4.1
    importlib_metadata==9.0.0
    inquirerpy==0.3.4
    invisible-watermark==0.2.0
    invoke==2.2.1
    itsdangerous==2.2.0
    Jinja2==3.1.6
    jmespath==1.1.0
    kiwisolver==1.5.0
    lightning-utilities==0.15.3
    lion-pytorch==0.2.3
    Markdown==3.10.2
    markdown-it-py==4.0.0
    MarkupSafe==3.0.3
    matplotlib==3.10.3
    mdurl==0.1.2
    -e git+https://github.com/Nerogar/mgds.git@a25b59f7619da99fdc6f8e8d5a0d89be519a4671#egg=mgds
    mpmath==1.3.0
    multidict==6.7.1
    -e git+https://github.com/KellerJordan/Muon.git@f90a42b28e00b8d9d2d05865fe90d9f39abcbcbd#egg=muon_optimizer
    networkx==3.6.1
    numpy==2.2.6
    nvidia-ml-py==13.595.45
    omegaconf==2.3.0
    onnxruntime-gpu==1.23.2
    open_clip_torch==2.32.0
    opencv-python==4.11.0.86
    orderly-set==5.5.0
    orjson==3.11.7
    packaging==26.0
    paramiko==4.0.0
    parse==1.20.2
    pfzy==0.3.4
    pillow==12.1.1
    platformdirs==4.9.4
    pooch==1.8.2
    prettytable==3.17.0
    prodigy-plus-schedule-free==2.0.1
    prodigyopt==1.1.2
    prompt_toolkit==3.0.52
    propcache==0.4.1
    protobuf==7.34.0
    psutil==7.0.0
    py-cpuinfo==9.0.0
    pycares==5.0.1
    pycparser==3.0
    pydantic==2.12.5
    pydantic-extra-types==2.11.1
    pydantic-settings==2.13.1
    pydantic_core==2.41.5
    Pygments==2.19.2
    PyNaCl==1.6.2
    pyparsing==3.3.2
    pyreadline3==3.5.4
    python-dateutil==2.9.0.post0
    python-dotenv==1.2.2
    python-multipart==0.0.22
    pytorch-lightning==2.6.1
    pytorch_optimizer==3.6.0
    PyWavelets==1.9.0
    PyYAML==6.0.2
    regex==2026.2.28
    requests==2.32.5
    rich==14.3.3
    rich-toolkit==0.19.7
    rignore==0.7.6
    runpod==1.7.10
    s3transfer==0.16.0
    safetensors==0.7.0
    scalene==1.5.51
    scenedetect==0.6.7.1
    schedulefree==1.4.1
    scipy==1.15.3
    sentencepiece==0.2.1
    sentry-sdk==2.55.0
    setuptools==81.0.0
    shellingham==1.5.4
    six==1.17.0
    starlette==0.52.1
    sympy==1.14.0
    tensorboard==2.20.0
    tensorboard-data-server==0.7.2
    timm==1.0.25
    tokenizers==0.22.2
    tomli==2.4.0
    tomlkit==0.14.0
    torch==2.9.1+cu128
    torchmetrics==1.9.0
    torchvision==0.24.1+cu128
    tqdm==4.67.1
    tqdm-loggable==0.4.1
    transformers==4.57.6
    triton-windows==3.5.1.post24
    typer==0.24.1
    typing-inspection==0.4.2
    typing_extensions==4.15.0
    ujson==5.11.0
    urllib3==2.6.3
    uvicorn==0.42.0
    watchdog==6.0.0
    watchfiles==1.1.1
    wcwidth==0.6.0
    websockets==16.0
    Werkzeug==3.1.6
    wheel==0.46.3
    wrapt==2.1.2
    yarl==1.23.0
    yt-dlp==2026.3.17
    zipp==3.23.0

=== Git Information ===
Repo: Nerogar/OneTrainer
Branch: master
Commit: cb6cab28b1cf3ca7e0c95563fc9a1fdab41dc09e
No deleted, unmerged, or modified files relative to origin/master.

=== Network Connectivity ===
PyPI (https://pypi.org/): Failure: expected string or bytes-like object, got 'NoneType'
HuggingFace (https://huggingface.co): Failure: expected string or bytes-like object, got 'NoneType'
Google (https://www.google.com): Failure: expected string or bytes-like object, got 'NoneType'

=== Intel Microcode Information ===
CPU is not detected as 13th or 14th Gen Intel - microcode info not applicable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: NaN error when training vpred model with generalized offset noise enabled #1389

What happened?

What did you expect would happen?

Relevant log output

Generate and upload debug_report.log

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: NaN error when training vpred model with generalized offset noise enabled #1389

Description

What happened?

What did you expect would happen?

Relevant log output

Generate and upload debug_report.log

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions