galen
banner
nel.ag
galen
@nel.ag
learning
+1
August 12, 2025 at 10:36 PM
model of self is important! not being able to count letters is whatever, not recognizing this is an actual problem
August 8, 2025 at 4:27 PM
>"then I put it in our team chat"
do you use slack for work comms or smthn else? we use matrix but curious if signal itself is usable
August 2, 2025 at 2:39 PM
they ablate this and confirm the result only holds for same-model distillation
July 22, 2025 at 5:49 PM
they're also here
bsky.app/profile/metr...
metr.org METR @metr.org · Jul 14
METR previously estimated that the time horizon of AI agents on software tasks is doubling every 7 months.

We have now analyzed 9 other benchmarks for scientific reasoning, math, robotics, computer use, and self-driving; we observe generally similar rates of improvement.
July 21, 2025 at 5:39 PM
isn't that the Void infra?
July 9, 2025 at 8:39 PM
toronto mentioned!
June 28, 2025 at 11:44 PM
congrats on the release! any thoughts on full duplex? seems like with everything else so polished turn-taking is really holding it back
June 8, 2025 at 5:39 AM
woah evan post
June 2, 2025 at 3:42 AM
fwiw model costs are very directly proportional to energy costs. if videogen doesn't become more efficient then the dozen samples costs >$100 and doesn't get consumer usage
May 21, 2025 at 4:40 PM
this is a pretty reasonable mistake, and it's not even a crazy number to report for the cost of hobbyist usage, but breaks the reference class for estimating commercial deployments
May 21, 2025 at 4:14 PM
the researcher probably ran the default code on a 5090 or something and reported what was logged there, but in the default config *most* of the energy is being used to shuffle the model between the CPU and GPU hundreds of times just because it doesn't have enough vram
May 21, 2025 at 4:12 PM
but it's also an OOM off from the number reported! I think i found the reason though; there's a line in the code `pipe.enable_sequential_cpu_offload()` which helps fit the larger model on consumer graphics cards
May 21, 2025 at 4:10 PM
so I just benched cogx 1.5 5b on an h100, it took 6min 48sec, avging close to max load on the chip at 685 watts, so 685*0.12 hours = 0.082 kWh or 295k joules. This is a lot higher than I expected! I'm not used to diffusion model pipelines.
May 21, 2025 at 4:07 PM
huh that sounds 10-100x higher than I'd expect for similar models, eg this implies in the range of $15 per generation in server costs, is this amortizing training?
May 21, 2025 at 2:55 PM