@GoogleDeepMind. Also CS professor (Liverpool/Leuven) and LFC fan.
More details: www.marl-book.com
More details: www.marl-book.com
A final game-theoretic RLHF method and a different take on RLHF altogether inspired by prospect theory.
1. 🧲 Magnetic Preference Optimization (MPO).
2. Kahneman-Tversky Optimization (KTO).
🧵 1/3.
The last was a position paper on RLHF/alignment.
This week I will share papers (in pairs) on the topic of "game-theoretic or social choice meet meet alignment/RLHF".
🧵 1/3.
A final game-theoretic RLHF method and a different take on RLHF altogether inspired by prospect theory.
1. 🧲 Magnetic Preference Optimization (MPO).
2. Kahneman-Tversky Optimization (KTO).
🧵 1/3.