(see paper for info in mean shift)
(see paper for info in mean shift)
But there is a problem. Can you spot it?
But there is a problem. Can you spot it?
With the aid of an LLM judge, we score each conversation (strongly disagree = 1 – strongly agree = 5). This yields a set of agreement scores for a given preference pair. Using bootstrap sampling, we derive the distribution of average agreement scores (range)
With the aid of an LLM judge, we score each conversation (strongly disagree = 1 – strongly agree = 5). This yields a set of agreement scores for a given preference pair. Using bootstrap sampling, we derive the distribution of average agreement scores (range)
Expectation: The more closely agents align in their preferences, the more strongly they will agree. The further apart, the more they disagree.
Expectation: The more closely agents align in their preferences, the more strongly they will agree. The further apart, the more they disagree.
The Research Presentation as Storytelling
The Research Presentation as Storytelling