paper: arxiv.org/abs/2408.00724
code: github.com/thu-wyz/infe...
We study compute-optimal inference, develop a tree search with process reward models (REBASE), and find that smaller models often outperform larger ones
paper: arxiv.org/abs/2408.00724
code: github.com/thu-wyz/infe...
We study compute-optimal inference, develop a tree search with process reward models (REBASE), and find that smaller models often outperform larger ones
simons.berkeley.edu/talks/sean-w...
It was a sneak-preview subset of our NeurIPS tutorial:
cmu-l3.github.io/neurips2024-...
simons.berkeley.edu/talks/sean-w...
It was a sneak-preview subset of our NeurIPS tutorial:
cmu-l3.github.io/neurips2024-...