Clayton Thorrez
cthorrez.bsky.social
Clayton Thorrez
@cthorrez.bsky.social
EsportsBench refreshed with data up through June 2025, over 61k new matches across 20 esports have been recorded in the last 3 months!
huggingface.co/datasets/Esp...
July 16, 2025 at 6:31 AM
I love it when the same notation can mean the *exact* opposite thing when used by different authors...

Should "A ≻ B" mean:
"A is preferred to B (higher rating)"
"B is preferred to A (lower rank number)"?

arxiv.org/pdf/2411.049...
www.tandfonline.com/doi/full/10....
July 3, 2025 at 6:56 PM
very slightly :)

basically my rules of thumb are to never use numpy on scalars unless the function simply doesn't exist in base python, and to try the simpler thing, ** and pow are general and need to support raising numbers to any power, num*num is a single multiplication
June 27, 2025 at 6:36 PM
qwen3-235b be like
June 24, 2025 at 9:57 PM
How much value does thinking add to an LLM?

Well for the largest Qwen3, the answer is -28 points

Thinking on academic benchmarks seems to help a lot, I wonder what's going wrong in the arena?

Maybe people can sense the hedging and don't like it, or it poisons its own context with overthinking
June 24, 2025 at 9:56 PM
I recently learned they might be interested in trying this now lol
June 23, 2025 at 9:17 PM
When gemini writes code in an artifact window, there are 9 buttons on the UI

None of them are to copy the code
June 14, 2025 at 5:33 AM
Maybe I'm jus very desensitized to very small differences, but model 1 looks better to me than model 2 by a noticeable amount with the difference being model 2 has no color effects
June 6, 2025 at 10:23 PM
Just reading all the details now do you mean this?
I think if I'm reading it right 0.363 is the white advantage when when scaled to Elo space is ~63 points, sounds significant to me.
June 6, 2025 at 10:23 PM
Anyone know how to report a bug in Google Scholar? Online seems like there is not great public support. It's not my paper just one I reference a lot.

Google thinks TrueSkill is in Russian!
scholar.google.com/scholar?hl=e...
June 5, 2025 at 4:22 PM
I'm not sure I understand the ties segment, I looked at a random sample from the ties segment and it seems to have one good answer and one bad one.

Is that just my subjective disagreement from the original grader?
June 2, 2025 at 4:57 PM
ChatGPT getting algebra correct that wolfram alpha gets wrong
June 1, 2025 at 11:11 PM
Is this like a numerical issue on wolfram or am I missing something? This should be 0 right?
www.wolframalpha.com/input?i=%281...
June 1, 2025 at 11:05 PM
May 31, 2025 at 6:09 PM
look I'm a big fan of WSL, and I used it for all of my side projects, but it also has a enough issues to require me to make this powershell script lol
May 31, 2025 at 6:23 AM
Still not sure why it can give slightly higher accuracy though. (The differences are very small but so far consistent across datasets)
May 31, 2025 at 1:14 AM
The scale is less understood. In the Elo eqns, it takes the form of both the base of the exponent (10) and the dividing factor 400. But in fact this is really just one hyperparam with default value log(10)/400. This scale param is the same as the temperate/steepness t of a sigmoid
May 31, 2025 at 1:14 AM
RIPPPP
May 14, 2025 at 5:48 AM
What an amazingly relatable chapter name
May 14, 2025 at 1:42 AM
A fairly obvious weakness of the paper is that they have a section describing a "Maximum likelihood estimation" variant of Elo but fail to mention that this is simply Bradley-Terry with a multiplicative shift...
May 13, 2025 at 5:04 AM
Dads when you touch the thermostat:
May 10, 2025 at 6:12 AM
I guess I picked the right day to start reading Stand on Zanzibar by John Brunner
May 3, 2025 at 6:55 PM
May 1, 2025 at 4:03 PM
why does chatgpt talk like a twitter AI influence now with emoji bulleted lists?
April 18, 2025 at 4:59 AM
Extremely proud moment for myself today. I got my first academic citation! I worked on EsportsBench most of 2023 and into 2024, got rejected from Neurips DS&B, decided to still put it up on my website and maintain it on huggingface. Pleased some people find it interesting after so much work. :)
April 17, 2025 at 4:14 PM