You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Out-of-memory (OOM) errors occur when running online distillation training with a 14B-parameter teacher model and a 7B-parameter student model.
Could anyone provide guidance on the correct multi-node parallel configuration
Out-of-memory (OOM) errors occur when running online distillation training with a 14B-parameter teacher model and a 7B-parameter student model.
Could anyone provide guidance on the correct multi-node parallel configuration