"asynchronous training where each copy of the model does local computation [...] it makes people uncomfortable [...] but it actually works"
yep, i can confirm, it does work for real
see arxiv.org/abs/2501.18512
"asynchronous training where each copy of the model does local computation [...] it makes people uncomfortable [...] but it actually works"
yep, i can confirm, it does work for real
see arxiv.org/abs/2501.18512