[๐๐๐ ๐ ๐ฎ๐ฌ๐ฎ๐ฒ] Dispersion loss counteracts embedding condensation and improves generalization in small language models
dispersion cosine-similarity embedding manifold-learning icml condensation latent-space pre-training embedding-vectors dispersive large-language-models llm geometric-learning llms llm-training small-language-models mid-training icml-2026
-
Updated
May 6, 2026 - Python