youtu.be/PuFIs7OkzYY
Trying to limit the topics I post about, but this is my country and this one shook me. =(
youtu.be/PuFIs7OkzYY
Trying to limit the topics I post about, but this is my country and this one shook me. =(
way beyond what standard models predict.
bengolub.net/snff-2/
4/
way beyond what standard models predict.
bengolub.net/snff-2/
4/
Or, as they would call it, a grad student
Or, as they would call it, a grad student
"Your application has moved to the next stage"
Well done.
That is some Grade 'A' level trolling.
You got me.
"Your application has moved to the next stage"
Well done.
That is some Grade 'A' level trolling.
You got me.
We're super happy to release FineMath, the best open math dataset yet. A strong baseline to start training your own models
Find it in the trending section of HuggingFace ;)
We're super happy to release FineMath, the best open math dataset yet. A strong baseline to start training your own models
Find it in the trending section of HuggingFace ;)
But social media incentivizes non-nuanced bite-sized panic/optimism inducing takes. How do we increase the context with which information is disemminated in media?
But social media incentivizes non-nuanced bite-sized panic/optimism inducing takes. How do we increase the context with which information is disemminated in media?
1) This shouldn't take too long
2) Oh no
1) This shouldn't take too long
2) Oh no
There are now several training configs that together reproduce the training runs that lead to the final OLMo 2 models.
In particular, all the training data is available, tokenized and shuffled exactly as we trained on it!
There are now several training configs that together reproduce the training runs that lead to the final OLMo 2 models.
In particular, all the training data is available, tokenized and shuffled exactly as we trained on it!
I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.
You will join Adage, @mila-quebec.bsky.social and @polymtl.bsky.social.
More info on why and how you should apply:
academic.sologen.net/2024/11/22/g...
Deadline: Dec 1st
I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.
You will join Adage, @mila-quebec.bsky.social and @polymtl.bsky.social.
More info on why and how you should apply:
academic.sologen.net/2024/11/22/g...
Deadline: Dec 1st
Astronomers: ah yes the choo choo train nebula
Astronomers: ah yes the choo choo train nebula
A final game-theoretic RLHF method and a different take on RLHF altogether inspired by prospect theory.
1. 🧲 Magnetic Preference Optimization (MPO).
2. Kahneman-Tversky Optimization (KTO).
🧵 1/3.
The last was a position paper on RLHF/alignment.
This week I will share papers (in pairs) on the topic of "game-theoretic or social choice meet meet alignment/RLHF".
🧵 1/3.
A final game-theoretic RLHF method and a different take on RLHF altogether inspired by prospect theory.
1. 🧲 Magnetic Preference Optimization (MPO).
2. Kahneman-Tversky Optimization (KTO).
🧵 1/3.
1. Self-Play Preference Optimization (SPO).
2. Direct Nash Optimization (DNO).
🧵 1/3.
The last was a position paper on RLHF/alignment.
This week I will share papers (in pairs) on the topic of "game-theoretic or social choice meet meet alignment/RLHF".
🧵 1/3.
1. Self-Play Preference Optimization (SPO).
2. Direct Nash Optimization (DNO).
🧵 1/3.