Ex.Robotics at Invento | 🔗 https://narvind2003.github.io
Here to strictly talk about ML, NNs and related ideas. Casual stuff on x.com/nagaraj_arvind
My post, written with the help of an LLM (the irony!), is here. I poured my heart into this one:
medium.com/@gedanken.th...
#AI #DeepLearning #RNN #Transformer #HRM
My post, written with the help of an LLM (the irony!), is here. I poured my heart into this one:
medium.com/@gedanken.th...
#AI #DeepLearning #RNN #Transformer #HRM
I wrote a deep dive on why this is a full-circle moment for me, going back to the RNN finetuning days.
I wrote a deep dive on why this is a full-circle moment for me, going back to the RNN finetuning days.
It's the closest we've come yet to embodying Prof. Kahneman's vision of a System 1/2 mind in code.
It's the closest we've come yet to embodying Prof. Kahneman's vision of a System 1/2 mind in code.
Each new "Thinking Session" (the M-loop) starts with the flawed result of the last one. It forces the model to confront its own errors until the logic is perfect.
Each new "Thinking Session" (the M-loop) starts with the flawed result of the last one. It forces the model to confront its own errors until the logic is perfect.
🧠 A strategic CEO (H-module) who thinks slow, sees the big picture, and sets the overall strategy.
⚡️ A diligent Worker (L-module) who thinks fast, executing the details of the CEO's plan.
This separation allows for truly deep, iterative thought.
🧠 A strategic CEO (H-module) who thinks slow, sees the big picture, and sets the overall strategy.
⚡️ A diligent Worker (L-module) who thinks fast, executing the details of the CEO's plan.
This separation allows for truly deep, iterative thought.
This is the architecture I've been waiting for since 2018. A thread on HRM. 🧵
This is the architecture I've been waiting for since 2018. A thread on HRM. 🧵
Taking a time machine within a time machine... stealing someone's consciousness...the ideas were next level!
The guy is a beast.
It's a shame Shane Carruth couldn't carry on making more amazing films.
Taking a time machine within a time machine... stealing someone's consciousness...the ideas were next level!
The guy is a beast.
It's a shame Shane Carruth couldn't carry on making more amazing films.
There are so many incredible moments in this film.
Wow...have you seen 'Upstream color' as well?
There are so many incredible moments in this film.
Wow...have you seen 'Upstream color' as well?
I should read this!
I should read this!
7. The encoding signal is not going to die out. It can be preserved by doing it as part of the softmax dot product attn.
8. What a gorgeous 😍 idea...
7. The encoding signal is not going to die out. It can be preserved by doing it as part of the softmax dot product attn.
8. What a gorgeous 😍 idea...
5. There are 2 benefits: the semantic meaning of the token is not corrupted. We only rotate the vector, preserving the magnitude.
5. There are 2 benefits: the semantic meaning of the token is not corrupted. We only rotate the vector, preserving the magnitude.
1. We need a way to encode token positions when feeding them as input into the transformer
2. We could just concat 1,2,3 etc. but this doesn't scale for variable lengths
3. Noam Shazeer showed show sin and cos waves can produce a beautiful pattern that encodes relative positions bw tokens.
1. We need a way to encode token positions when feeding them as input into the transformer
2. We could just concat 1,2,3 etc. but this doesn't scale for variable lengths
3. Noam Shazeer showed show sin and cos waves can produce a beautiful pattern that encodes relative positions bw tokens.