Ex.Robotics at Invento | 🔗 https://narvind2003.github.io
Here to strictly talk about ML, NNs and related ideas. Casual stuff on x.com/nagaraj_arvind
This is the architecture I've been waiting for since 2018. A thread on HRM. 🧵
This is the architecture I've been waiting for since 2018. A thread on HRM. 🧵
I have tried a bunch of ways and it refuses to!! 😭
I have tried a bunch of ways and it refuses to!! 😭
I had to explain it to the poor thing!
I had to explain it to the poor thing!
Deepseek's model (inner monologue thinking tokens) are super interesting to watch. But the CoT trajectories take it to 2 incorrect solutions before it runs out thinking time: It either adds an extra 8 or uses cube roots.
Can't nest like👇
Deepseek's model (inner monologue thinking tokens) are super interesting to watch. But the CoT trajectories take it to 2 incorrect solutions before it runs out thinking time: It either adds an extra 8 or uses cube roots.
Can't nest like👇
At first glance it looked like they managed to introduce valuable "reasoning" tokens, compute a reward score (softmax of argmax token logprob over top 5 potential candidate tokens)
And finally added a reflection phrase
At first glance it looked like they managed to introduce valuable "reasoning" tokens, compute a reward score (softmax of argmax token logprob over top 5 potential candidate tokens)
And finally added a reflection phrase