Add ChipAlign geodesic interpolation method#529
Add ChipAlign geodesic interpolation method#529dlmastery wants to merge 1 commit intoarcee-ai:mainfrom
Conversation
Implements ChipAlign model merging technique from https://arxiv.org/abs/2412.19819 that combines instruction-aligned models with domain-specific models using geodesic interpolation with magnitude preservation. Features: - Added geodesic interpolation option to NuSLERP merge method - Added ChipAlign example configuration in examples/chipalign.yml - Updated documentation in README.md
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
|
||
| # Perform spherical interpolation on unit vectors | ||
| from mergekit.merge_methods.slerp import slerp | ||
| merged_tensor_unit = slerp( |
There was a problem hiding this comment.
I'd suggest using the nuslerp function here instead - the old slerp moves tensors to CPU so it's a lot slower. It should give the same results though.
There was a problem hiding this comment.
(This also lets us respect nuslerp_flatten and nuslerp_row_wise which would be good.)
| t = 0.5 | ||
| t = 0.5 # Default when weights sum to zero | ||
| else: | ||
| t = weights[1] / sum(weights) |
There was a problem hiding this comment.
Is there a reason for introducing a new lambda parameter instead of using t?
|
Thank you for the pull request! Couple of comments in there, and if you could run the pre-commit hook to standardize the formatting that would be appreciated. Would be great to get this in. |
Implements ChipAlign model merging technique from https://arxiv.org/abs/2412.19819 that combines instruction-aligned models with domain-specific models using geodesic interpolation with magnitude preservation.
Features: