Fix and Enhance LoRA-Muon Setup: Orthogonalize B, Adam A by Koratahiu · Pull Request #1314 · Nerogar/OneTrainer

Koratahiu · 2026-02-12T02:14:05Z

Muon wasn't originally designed for LoRAs, and as a result, the existing documentations and implementations are quite lacking.

In my initial implementation of Muon (within OT), I applied Muon to all LoRA layers based on hidden layer mapping. However, this is sub-optimal. Muon should only be applied to the B matrix. Because the A matrix typically has an extreme aspect ratio, Muon’s orthogonalization process tends to produce noise (garbage outputs) rather than meaningful updates.

This PR addresses this by:

Applying Muon exclusively to the B matrix.
Assigning AuxAdam to the A matrix.

This setup makes LoRA significantly more robust; theoretically, we can achieve DoRA-like effects on standard LoRA using this configuration.

Technical Context

I identified this issue while testing #1263 (comment). After applying the scaling to both the A and B matrices, I observed that spectral normalization severely cripples the orthogonalization of the A matrix when using Muon.

This approach is supported by recent research (see: arXiv:2508.17901), which proposes orthogonalizing the B matrix to achieve superior results. The paper notes that if the B matrix contains redundant (correlated) columns, the "effective rank" of the update falls below the actual rank. Orthogonalization resolves this by ensuring the update remains full-rank and efficient.

Potential Benefits

Enhanced Robustness: Prevents the "garbage" noise generation caused by applying orthogonalization to the extreme aspect ratios of the A matrix.
Improved Effective Rank: By orthogonalizing the B matrix, we eliminate column correlation, ensuring the update maintains its full actual rank .
DoRA-like Performance: Theoretically allows standard LoRA to achieve effects similar (or superior) to Weight-Decomposed Low-Rank Adaptation (DoRA) without the extra overhead.
Optimization Efficiency: Uses the strengths of Muon where it excels (B matrix) and relies on AuxAdam where Muon struggles (A matrix).

❕ Tests, comparisons, and feedback are always welcome.

dxqb · 2026-02-12T10:27:17Z

even if this is clearly useful, I'm not sure this should be hardcoded.

when a user selects the Muon optimizer, I think he'd expect OneTrainer to use the Muon optimizer as if he downloaded the optimizer and just use it as the authors have recommended.
If the authors haven't considered LoRA at all, I'd prefer to

discuss it with the authors
or, make it a toggle

Koratahiu · 2026-02-16T14:35:49Z

Closed
Muon distorts the LoRA geometry anyway

Koratahiu · 2026-02-19T11:45:38Z

Even though this is closed;
I found that SignSGD (AKA simplified Adam) has consistent RMS of 1 for all ranks of LoRA A factor, meaning the same LR can be used for all of them.
This was without scalings or such.

initial

635314c

Koratahiu closed this Feb 16, 2026

Koratahiu deleted the lora_muon branch February 18, 2026 07:49

Koratahiu mentioned this pull request Mar 31, 2026

Stiefel-LoRA: Orthogonalized and Rank-Efficient LoRA #1333

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix and Enhance LoRA-Muon Setup: Orthogonalize B, Adam A#1314

Fix and Enhance LoRA-Muon Setup: Orthogonalize B, Adam A#1314
Koratahiu wants to merge 1 commit intoNerogar:masterfrom
Koratahiu:lora_muon

Koratahiu commented Feb 12, 2026

Uh oh!

dxqb commented Feb 12, 2026 •

edited

Loading

Uh oh!

Koratahiu commented Feb 16, 2026

Uh oh!

Koratahiu commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Koratahiu commented Feb 12, 2026

Technical Context

Potential Benefits

Uh oh!

dxqb commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Koratahiu commented Feb 16, 2026

Uh oh!

Koratahiu commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dxqb commented Feb 12, 2026 •

edited

Loading