andyjliu.github.io
(📷 xkcd)
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
(📷 xkcd)
(📷 xkcd)
We put LLMs in a simulated market and find that collusion increases when they are able to communicate via natural language, differs across models, and is influenced by urgency and oversight.
1/
things i've previously liked, for reference -
nonfiction: the structure of scientific revolutions, cybernetic revolutionaries, seeing like a state
fiction: stories of your life and others, one hundred years of solitude, project hail mary, recursion
things i've previously liked, for reference -
nonfiction: the structure of scientific revolutions, cybernetic revolutionaries, seeing like a state
fiction: stories of your life and others, one hundred years of solitude, project hail mary, recursion
go.bsky.app/NhTwCVb
go.bsky.app/NhTwCVb