https://scholar.google.com/citations?user=I80vy5cAAAAJ
arxiv.org/abs/2506.09018
arxiv.org/abs/2506.09018
arxiv.org/abs/2410.20587
arxiv.org/abs/2410.20587
So our preprint, driven by @lukasbillera.bsky.social with assists from @hedwignordlinder.bsky.social, formalizes this, and extends it a little in ways that are trickier to heuristically reason about:
arxiv.org/abs/2511.16599
So our preprint, driven by @lukasbillera.bsky.social with assists from @hedwignordlinder.bsky.social, formalizes this, and extends it a little in ways that are trickier to heuristically reason about:
arxiv.org/abs/2511.16599
bsky.app/profile/benj...
bsky.app/profile/benj...
Since the dawn of time, people have been messing with (or dropping entirely) these pesky time-dependent loss scaling terms, mostly because the models train better without them.
Since the dawn of time, people have been messing with (or dropping entirely) these pesky time-dependent loss scaling terms, mostly because the models train better without them.
The manuscript should be up by tomorrow and I'll drop a link.
The manuscript should be up by tomorrow and I'll drop a link.
Autocorrelated insertions? Change how you build the trees! Same for the "anchors" which control the process on internal branches.
Autocorrelated insertions? Change how you build the trees! Same for the "anchors" which control the process on internal branches.
Then, given this Z, Xt evolves over the trees, sampling when (but not which) branching and deletion events occur, all constructed to terminate at X1.
Then, given this Z, Xt evolves over the trees, sampling when (but not which) branching and deletion events occur, all constructed to terminate at X1.