Mettez-vous hors ligne avec l'application Player FM !
[QA] The AdEMAMix Optimizer: Better, Faster, Older
Manage episode 438438306 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 episodes
Manage episode 438438306 series 3524393
This paper critiques single EMA usage in momentum optimizers, proposing AdEMAMix, which combines two EMAs for improved gradient relevance, faster convergence, and reduced model forgetting in training.
https://arxiv.org/abs//2409.03137
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1645 episodes
Alle episoder
×Bienvenue sur Lecteur FM!
Lecteur FM recherche sur Internet des podcasts de haute qualité que vous pourrez apprécier dès maintenant. C'est la meilleure application de podcast et fonctionne sur Android, iPhone et le Web. Inscrivez-vous pour synchroniser les abonnements sur tous les appareils.