1. The mode loses even to other discrete methods such as the median or first percentile (and the gap grows when we use finer judgment granularity).
2. Incorporating risk aversion often improves performance.
1. The mode loses even to other discrete methods such as the median or first percentile (and the gap grows when we use finer judgment granularity).
2. Incorporating risk aversion often improves performance.
We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵
(w/ Michael J.Q. Zhang, @eunsol.bsky.social)
We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵
(w/ Michael J.Q. Zhang, @eunsol.bsky.social)