Will Kurt
banner
willkurt.bsky.social
Will Kurt
@willkurt.bsky.social
"The idea of an environment scarcely makes any sense since you can never draw a boundary line that would distinguish an organism from what surrounds it." - Bruno Latour
The current messiness around LLM evaluations is ultimately caught up in the limits of working under conditions of pure empericism.

We’ll never dig ourselves entirely out of this hole until theory starts to catch up with practice.

Paper after paper overreaches and attempts impossible general claims
November 25, 2024 at 7:14 AM
Reposted by Will Kurt
LLM observation of the day: I think that guided/constrained generation gets a bad rap. There was one paper making the rounds about how guided generation harms reasoning ability that everyone took as gospel.
November 13, 2024 at 11:06 PM
Reposted by Will Kurt
A new paper, "Let Me Speak Freely" has been spreading rumors that structured generation hurts LLM evaluation performance.

Well, we've taken a look and found serious issue in this paper, and shown, once again, that structured generation *improves* evaluation performance!
November 21, 2024 at 6:33 PM
Reposted by Will Kurt
Our new blog post is out!

@willkurt.bsky.social provides a rebuttal for a reasonably well known paper which concluded that structured generation with LLMs always resulted in worse performance.

We do not find the same thing.

blog.dottxt.co/say-what-you...
November 21, 2024 at 6:23 PM
First post! Created this account awhile ago, but things seem to be picking up and it has a very nice "old Twitter" feel to it here!
November 12, 2024 at 7:59 PM