By tracking first-word statistics in reasoning traces, I found commonalities across models using pivotal tokens like "wait".
A lot of models are similar to R1, but others like sonnet and gemini come from their own school of thought.
By tracking first-word statistics in reasoning traces, I found commonalities across models using pivotal tokens like "wait".
A lot of models are similar to R1, but others like sonnet and gemini come from their own school of thought.
huggingface.co/kyutai/heliu...
huggingface.co/kyutai/heliu...