Mechanical Dirk
banner
mechanicaldirk.bsky.social
Mechanical Dirk
@mechanicaldirk.bsky.social
Training big models at @ai2.bsky.social.
@ananyahjha93.bsky.social knows the pain. We once got reviews to a paper that said "Please do further experiments [which would cost $2M]", and _also_ a review that said "This work is too expensive to be relevant to anyone in the field.". In the same paper!
November 14, 2025 at 11:25 PM
You presumably have something you want to run, and it doesn't run. Or it runs on CPU. Spin up a Claude Code console and tell it about the problem. Tell it the command line that produces the wrong result. It'll start suggesting ways of fixing your environment, down to modifying installed drivers.
October 8, 2025 at 4:53 PM
Let Claude Code sort out your environment.
October 8, 2025 at 1:44 AM
Are humans allowed to attend?
September 24, 2025 at 4:27 PM
What field/area is like this now?
September 24, 2025 at 4:21 PM
Almost all post-training is "dusting off capable base models"
July 28, 2025 at 2:52 AM
Unverified second hand information: In the US, all fish has to be flash-frozen before being served raw. In Canada, it does not.
July 18, 2025 at 10:39 PM
I think for the moment we're competing on a different axis. They do quite well on impact per GPU hour. We do well on impact per person hour.
June 24, 2025 at 6:28 AM
In ML, you can get surprisingly far without ever looking at your training data, and yet you'll always be limited. Thus, in ML, "look at the data" means, "Don't just stir the pot of linear algebra, find out what's really happening."
June 8, 2025 at 4:48 PM
Meanwhile, OLMo is now the citation for QK norm, which we definitely didn't invent? You win some, you lose some.
May 13, 2025 at 5:22 PM
After ICML, I decided all conferences should be in Vienna from now on.
April 23, 2025 at 9:34 PM
It costs $90k. The $1000 are just a down payment.
March 16, 2025 at 6:56 PM