we keep getting more and more confirmation that reasoning begins in pre-training
today’s evidence: arxiv.org/abs/2510.07364
maybe Gemini 3 is the tidal shift where Google gains a permanent lead
www.sciencedirect.com/science/arti...
www.sciencedirect.com/science/arti...
1. "And you can never test whether your data was generated in the iid model, nor can you test if it will be generated in the iid model tomorrow."
You can actually do the first part with evaluations but not the second.
1. "And you can never test whether your data was generated in the iid model, nor can you test if it will be generated in the iid model tomorrow."
You can actually do the first part with evaluations but not the second.
Tim was a software engineer, I was a scientist in the research group where most code was written in Perl. I loved Perl, I quickly learned to use it and still do,
Tim was a software engineer, I was a scientist in the research group where most code was written in Perl. I loved Perl, I quickly learned to use it and still do,
This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.
ai-scientific-discovery.github.io
This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.
ai-scientific-discovery.github.io
But as the paper points out, M=2 logically possible evaluations must be a subset of the product of their M=1 evaluations!
But as the paper points out, M=2 logically possible evaluations must be a subset of the product of their M=1 evaluations!