DoRA-OFT (DOFT): More Stable and Faster DoRA-variant by Koratahiu · Pull Request #1335 · Nerogar/OneTrainer

Koratahiu · 2026-02-21T03:15:31Z

I recently revisited some "dusty tomes" of abandoned sketch ideas and rediscovered my attempt at combining DoRA with OFT. I originally scrapped it because it felt at odds with OFT’s theory of norm-energy preservation - but looking back, it's actually a powerhouse combo.

It turns out DoRA-OFT is just as effective as standard DoRA, but significantly more stable and much faster. Here’s why:

The Synergy: OFT handles weight rotation (direction and angle) while DoRA manages the norm (magnitude).
The Speed: Since OFT is orthogonal and norm-preserving, DoRA calculations become incredibly streamlined. It assumes the initial weight norm is preserved - unlike standard LoRA, which changes it every step.
The Result: We bypass the heavy re-calculation overhead of DoRA, achieving the same it/s as standard OFT.

Performance:

In my tests, this method learned in half the steps of standard LoRA while maintaining very high expressivity (an area where standard OFT typically struggled).

By merging these two, we get the superior training dynamics of DoRA with the stability and speed of OFT. It’s a very promising method.

yamatazen · 2026-02-21T03:29:33Z

Is there a paper for this?

Koratahiu · 2026-02-21T03:40:06Z

Is there a paper for this?

No, but DoRA is a theory of decoupling the norm (magnitude) from the direction. It can be applied to any adapter method (e.g., LoHa, LoKr, etc.).
What makes it unique with OFT, however, is that OFT preserves the norm of the model and learns purely the rotation (direction). This makes it possible to bypass the heavy calculations associated with DoRA, achieving the same speed as standard OFT.

yamatazen · 2026-02-21T03:43:10Z

Is this based on OFTv2?

…into DoRA_OFT

Koratahiu · 2026-02-21T03:55:48Z

Merged in #1315.

This method should be used with scaled OFT option.
This is because signed optimizers (e.g., Adam) produce a step size of O(1) for the DoRA scale parameter, but a step size of O(1/√n_elements) for OFT blocks.
In a stable learning rate range, this effectively results in standard OFT because the LR for the DoRA scale would be too low to learn anything.

By using the scaled OFT setting, both parameters will share the same step size and LR range.

…into DoRA_OFT

Koratahiu added 10 commits February 14, 2026 03:22

initial

3aa10e4

add missing input

3b63984

remove leftover

765edf4

fix

75d0f13

pre-commit

c212bc6

fix the pre-commit

d082719

add scaled_oft buffer

b0ce320

initial

60ab49e

initial ui

be0d3f4

fix

7b8996b

Merge branch 'scaled_oft' of https://github.com/Koratahiu/OneTrainer …

cbbe64e

…into DoRA_OFT

Koratahiu added 4 commits February 22, 2026 23:57

fix

873ab5b

Improve to (2*(sqrt(block_size-1)))

fb2d84b

pre-commit

8fa908b

Merge branch 'scaled_oft' of https://github.com/Koratahiu/OneTrainer …

07de060

…into DoRA_OFT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DoRA-OFT (DOFT): More Stable and Faster DoRA-variant#1335

DoRA-OFT (DOFT): More Stable and Faster DoRA-variant#1335
Koratahiu wants to merge 15 commits intoNerogar:masterfrom
Koratahiu:DoRA_OFT

Koratahiu commented Feb 21, 2026

Uh oh!

yamatazen commented Feb 21, 2026

Uh oh!

Koratahiu commented Feb 21, 2026

Uh oh!

yamatazen commented Feb 21, 2026

Uh oh!

Koratahiu commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Koratahiu commented Feb 21, 2026

Performance:

Uh oh!

yamatazen commented Feb 21, 2026

Uh oh!

Koratahiu commented Feb 21, 2026

Uh oh!

yamatazen commented Feb 21, 2026

Uh oh!

Koratahiu commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants