Is standard RLHF optimal in view of test-time scaling? Unsurprisingly no.
We show a simple change to standard RLHF framework that involves 𝐫𝐞𝐰𝐚𝐫𝐝 𝐜𝐚𝐥𝐢𝐛𝐫𝐚𝐭𝐢𝐨𝐧 and 𝐫𝐞𝐰𝐚𝐫𝐝 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 (suited to test-time procedure) is optimal!
𝘊𝘢𝘯 𝘸𝘦 𝘢𝘭𝘪𝘨𝘯 𝘰𝘶𝘳 𝘮𝘰𝘥𝘦𝘭 𝘵𝘰 𝘣𝘦𝘵𝘵𝘦𝘳 𝘴𝘶𝘪𝘵 𝘢 𝘨𝘪𝘷𝘦𝘯 𝘪𝘯𝘧𝘦𝘳𝘦𝘯𝘤𝘦-𝘵𝘪𝘮𝘦 𝘱𝘳𝘰𝘤𝘦𝘥𝘶𝘳𝘦?
Check out below.
Is standard RLHF optimal in view of test-time scaling? Unsurprisingly no.
We show a simple change to standard RLHF framework that involves 𝐫𝐞𝐰𝐚𝐫𝐝 𝐜𝐚𝐥𝐢𝐛𝐫𝐚𝐭𝐢𝐨𝐧 and 𝐫𝐞𝐰𝐚𝐫𝐝 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 (suited to test-time procedure) is optimal!
www.linkedin.com/posts/jonas-...
www.linkedin.com/posts/jonas-...
people.csail.mit.edu/rrw/time-vs-...
It's still hard for me to believe it myself, but I seem to have shown that TIME[t] is contained in SPACE[sqrt{t log t}].
To appear in STOC. Comments are very welcome!
people.csail.mit.edu/rrw/time-vs-...
It's still hard for me to believe it myself, but I seem to have shown that TIME[t] is contained in SPACE[sqrt{t log t}].
To appear in STOC. Comments are very welcome!
arxiv.org/abs/2501.06481
arxiv.org/abs/2501.06481
(Did I mention we are hiring on the Generative Media team, btw 👀)
blog.google/technology/g...
We need more part-time researchers to work with our interdisciplinary, all-virtual team of scientists and fellows studying climate solutions. Do you have research experience in climate +
#electricitygrid
#transportation
#oceans
#buildings
#agriculture
drawdown.org/careers/rese...
We need more part-time researchers to work with our interdisciplinary, all-virtual team of scientists and fellows studying climate solutions. Do you have research experience in climate +
#electricitygrid
#transportation
#oceans
#buildings
#agriculture
drawdown.org/careers/rese...
youtu.be/7RqFLp0TqV0?...
youtu.be/7RqFLp0TqV0?...
"I need my job, but I would still do it again. I truly would. I would still help somebody if I could."
"I need my job, but I would still do it again. I truly would. I would still help somebody if I could."
We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423
We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423
🖼️ Image Descriptions to improve Image-Text alignment
AND/OR
💬Multi/Cross Lingual image-text understanding/generation
AND/OR
🌏Geo-Cultural representation and learning
Please DM if you are willing to discuss the current state/challenges/future-work.
🖼️ Image Descriptions to improve Image-Text alignment
AND/OR
💬Multi/Cross Lingual image-text understanding/generation
AND/OR
🌏Geo-Cultural representation and learning
Please DM if you are willing to discuss the current state/challenges/future-work.
Your job is not just to handle your own review/response. You need to interact with other reviewers to come to a decision. In particular, if your review disagrees with everyone else, the burden is *on you* to engage.
The paper will need to have a single decision; the point of this exercise is not just about addressing each reviewer's concerns individually.
Your job is not just to handle your own review/response. You need to interact with other reviewers to come to a decision. In particular, if your review disagrees with everyone else, the burden is *on you* to engage.