nicolas
niid.xyz
nicolas
@niid.xyz
💼 GenAI & ML @ T-Systems
🌍 Berlin, Germany
niid.xyz
Grok3, Claude 3.7 and GPT 4.5 all within a couple days. It’s crazy how consistent the progress is that the labs are making.
March 1, 2025 at 8:24 AM
Critical information to keep in mind when interpreting the o3 results on the ARC-AGI benchmark (vs. o1):

1. o3-Inference cost a staggering $1.15M.
2. o3 was trained on a public ARC-AGI dataset, giving it a massive advantage over o1 which had no access to that data.

It’s not a fair comparison.
December 22, 2024 at 11:45 AM
Reposted by nicolas
Atoms are binary. They are either intended to be hydrogen or helium. We can't just scrap this worldview just because of a handful of exceptions
November 28, 2024 at 1:02 PM
Reposted by nicolas
Research like this is so important. A lot of decision making when deploying RAG systems is based on vibes-based eval using dated models and datasets. Many techniques simply don't transfer to modern models like you'd hope. Query expansion seems to fall into this category.
November 24, 2024 at 11:29 PM