Geometry-Aware LoRA Optimization for Faster and Stable Convergence by Koratahiu · Pull Request #1407 · Nerogar/OneTrainer

Koratahiu · 2026-04-03T21:47:54Z

This PR implements the gradient preconditioning technique proposed in Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models.

Standard optimizers treat LoRA's $A$ and $B$ matrices as completely independent parameters. This method transforms how your optimizer sees the weights by accounting for their dependency (the actual low-rank manifold $W = BA$), essentially acting as a specialized, highly efficient second-order optimizer for low-rank adapters.

Geometry-Aware: By preconditioning the gradients based on the relationship between the $A$ and $B$ matrices, it follows the true gradient of the low-rank space, preventing the optimizer from taking inefficient steps.
Optimizer Agnostic: Because this applies a preconditioning transformation to the raw gradient right before the optimizer step, it is fully compatible with your favorite standard optimizers (AdamW, Prodigy, SGD, etc.).

Important Notes:

Implementation Mechanics: This requires the $A$ and $B$ tensors to be "aware" of each other. The codebase handles this by cross-linking the tensors (_lora_pair) during initialization so the preconditioner can calculate the necessary matrix inversions on the fly.
Negligible Overhead: The computational cost is incredibly small. The matrix inversion required for the preconditioning is bounded by the LoRA rank (e.g., inverting a 16x16 or 64x64 matrix), not the full parameter dimension, meaning it won't slow down your s/it.
Not for DoRA (Yet): This specific preconditioning math is explicitly derived for standard LoRA formulation ($W = BA$). Applying this to decomposed weights (DoRA) is not recommended without further mathematical adaptation or tests.
Untested for Muon

Other Notes:

Suggested Ranges: You can generally start with your standard LoRA learning rates (e.g., 1e-4 to 4e-4 for AdamW). However, because the gradients are better conditioned, you might find that the model can tolerate and benefit from higher learning rates than usual.
Faster Convergence: Expect to hit your target loss/image quality in significantly fewer steps compared to standard un-preconditioned training.
Improved Stability: This method naturally stabilizes the training dynamics, particularly at higher ranks where standard Adam can sometimes struggle to balance the updates between the $A$ and $B$ matrices.

Usage

Enable Riemannian Preconditioning in LoRA Tab (we might change that name)

torch.linalg.solve is more accurate, stable, faster, and cheaper than torch.linalg.inv (for this specific use case), and it's mathematically identical.

…into precond_lora

Koratahiu · 2026-04-05T11:35:37Z

Update 1:

Added fallback for weights near or at zero: Previously, it was unstable due to the zero-initialized B factor; this should fix it.
Used torch.linalg.solve instead of torch.linalg.inv: solve is far superior in this context and more precise.
For reference, see: https://stackoverflow.com/questions/31256252/why-does-numpy-linalg-solve-offer-more-precise-matrix-inversions-than-numpy-li

Koratahiu added 6 commits April 4, 2026 00:10

initial

6642e2d

adjust config

fc218c3

Pre-commit and fix

4c0706a

add fallback when the weights near or at zero

0787f4d

Enhance: Solve instead of Inverse

444f8cb

torch.linalg.solve is more accurate, stable, faster, and cheaper than torch.linalg.inv (for this specific use case), and it's mathematically identical.

Merge branch 'precond_lora' of https://github.com/Koratahiu/OneTrainer …

4c61a36

…into precond_lora

Fix non-contiguous issue

c6e4357

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Geometry-Aware LoRA Optimization for Faster and Stable Convergence#1407

Geometry-Aware LoRA Optimization for Faster and Stable Convergence#1407
Koratahiu wants to merge 7 commits intoNerogar:masterfrom
Koratahiu:precond_lora

Koratahiu commented Apr 3, 2026

Uh oh!

Koratahiu commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Koratahiu commented Apr 3, 2026

Important Notes:

Other Notes:

Usage

Uh oh!

Koratahiu commented Apr 5, 2026

Update 1:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant