Arthur Douillard
douillard.bsky.social
Arthur Douillard
@douillard.bsky.social
distributed (diloco) + modularity (dipaco) + llm @ deepmind | continual learning phd @ sorbonne
from Jeff Dean at The Dwarkesh podcast:

"asynchronous training where each copy of the model does local computation [...] it makes people uncomfortable [...] but it actually works"

yep, i can confirm, it does work for real

see arxiv.org/abs/2501.18512
February 16, 2025 at 6:53 PM