Scaled OFT: Block-size Invariant Learning Rates by Koratahiu · Pull Request #1315 · Nerogar/OneTrainer

Koratahiu · 2026-02-14T01:26:44Z

In the standard OFT, changing the Block Size fundamentally alters the number of trainable elements in the rotation matrix, which in turn shifts the magnitude of the weights during the Cayley transform. This is particularly problematic when using sign-based optimizers (e.g., AdamW), as the update scale becomes a moving target (smaller LR for larger block size).

This PR addresses this by:

Introducing Scaled OFT, which applies a (1/√n_elements) scaling factor to the rotation weights.
Normalizing the effective weight based on the number of elements (N) in the skew-symmetric matrix before the parametrization step.

This ensures that the "step size" taken by the optimizer remains mathematically consistent, regardless of whether you are using a small or large block size.

Technical Context

The number of elements (n_elements) of OFT matrix [rank, n_elements] is calculated as:
n_elements = block_size * (block_size - 1) / 2

Without scaling, larger blocks result in a higher internal variance, which effectively "dilutes" or "amplifies" the learning rate when passed through the Cayley batch process.

By implementing effective_weight = self.weight / (self.n_elements**0.5), we stabilize the input to the Cayley parametrization. This ensures that the resulting orthogonal matrix maintains a consistent deviation from the identity matrix across different ranks.

Sanity Check

Other than my extensive tests (that got lost in the pit hole of tensorboard cache), here's a sanity check for block sizes (32, 64, 128), using 1e-3 LR:

Purple: 32, green: 64, pink: 128.

More Technical Context

If we interpret the OFT weight matrix [rank, n_elements] as a set of vectors (where set = rank and vector = n_elements), the update size of 1/√n_elements represents the theoretical and empirical update complexity of signed optimizers (e.g., Adam) and row-wise normalization optimizers (LMO).

By scaling these by 1/√n_elements, we are enforcing an update complexity of (1) for all sizes of n_elements.

Potential Benefits

Optimizer Consistency: Specifically benefits AdamW and other signed optimizers by ensuring the update magnitude is invariant to the OFT Block Size.
LR Portability: Allows users to find a stable Learning Rate once and keep it consistent even if they decide to change the block size.

❕ This is meant to keep different block-sizes in the same LR range, in similar effect to alpha=1 in LoRA. In my extensive tests, it enforced a 1e-3 LR as a stable baseline for all block sizes (for SDXL), but it's still an approximation (that's accurate enough).

This also solves #1231

Koratahiu · 2026-02-14T20:10:22Z

Update 1:

A 1e-3 LR works very well with Flux.2 Klein 4B using this method:

It appears that we need to scale the OFT weight during inference (very similar to LoRA alpha). This should be simple enough to implement (by adding scalar), and support has been added in my ComfyUI patch.

Koratahiu · 2026-04-08T20:16:48Z

While using this, doubling the block size from 256 (yellow) to 512 (purple) maintained the same LR.

@dxqb, is there anything I can do for this PR? It’s well-tested and works just fine.
I think it should be the default.

Koratahiu added 5 commits February 14, 2026 03:22

initial

3aa10e4

add missing input

3b63984

remove leftover

765edf4

fix

75d0f13

pre-commit

c212bc6

Koratahiu mentioned this pull request Feb 14, 2026

[OFT] Linear scaling for constant learning #1231

Closed

2 tasks

Koratahiu added 2 commits February 14, 2026 04:33

fix the pre-commit

d082719

add scaled_oft buffer

b0ce320

Koratahiu mentioned this pull request Feb 21, 2026

DoRA-OFT (DOFT): More Stable and Faster DoRA-variant #1335

Draft

Koratahiu added 2 commits March 4, 2026 01:02

Improve to (2*(sqrt(block_size-1)))

fb2d84b

pre-commit

8fa908b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scaled OFT: Block-size Invariant Learning Rates#1315

Scaled OFT: Block-size Invariant Learning Rates#1315
Koratahiu wants to merge 9 commits intoNerogar:masterfrom
Koratahiu:scaled_oft

Koratahiu commented Feb 14, 2026

Uh oh!

Koratahiu commented Feb 14, 2026

Uh oh!

Koratahiu commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Koratahiu commented Feb 14, 2026

Technical Context

Sanity Check

More Technical Context

Potential Benefits

Uh oh!

Koratahiu commented Feb 14, 2026

Uh oh!

Koratahiu commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant