For example, if you ask models to tell you a joke, they almost always tell you the same joke? This is true across samples and even across model families!
Why does this happen? Can we improve it?
For example, if you ask models to tell you a joke, they almost always tell you the same joke? This is true across samples and even across model families!
Why does this happen? Can we improve it?
We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models,
without requiring access to
🙅♀️ Model weights
🙅♀️ Training data
🙅♀️ Token probabilities 🧵 (1/5)
We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models,
without requiring access to
🙅♀️ Model weights
🙅♀️ Training data
🙅♀️ Token probabilities 🧵 (1/5)
We propose modeling at the individual-level using open-ended, textual value profiles! 🗣️📝
arxiv.org/abs/2503.15484
We propose modeling at the individual-level using open-ended, textual value profiles! 🗣️📝
arxiv.org/abs/2503.15484
DMs are open if anyone wants to chat! :)
DMs are open if anyone wants to chat! :)
🗓️ December 14
📍 West Meeting Room 116, 117
Join us to explore pluralistic perspectives in alignment with an incredible lineup of talks and speakers!
🔗 Full schedule & details: pluralistic-alignment.github.io
🗓️ December 14
📍 West Meeting Room 116, 117
Join us to explore pluralistic perspectives in alignment with an incredible lineup of talks and speakers!
🔗 Full schedule & details: pluralistic-alignment.github.io