https://youtube.com/@deeplearningexplained?si=yzsA4kpGN_8VDEY0
Linear attention has existed since 2020, yet large-scale models rarely use it. Why?
minimax-01 finally makes linear attention work at scale. Deep dive here: 📌 youtu.be/iRuvGU-Sk3c
Linear attention has existed since 2020, yet large-scale models rarely use it. Why?
minimax-01 finally makes linear attention work at scale. Deep dive here: 📌 youtu.be/iRuvGU-Sk3c
Sorry for the month break, was a bit overwhelmed with lots of things at work.
I'll try to move around the schedule a bit so that more people in different time zones can attend.
📸 PS: I gave a talk at a conference in February!
Sorry for the month break, was a bit overwhelmed with lots of things at work.
I'll try to move around the schedule a bit so that more people in different time zones can attend.
📸 PS: I gave a talk at a conference in February!
They are hinting big breakthrough, but man look at the landscape.
Every competitors around is stacked with billions and PhD.
Whatever they are trying to win, won’t be achieved by secrecy.
They are hinting big breakthrough, but man look at the landscape.
Every competitors around is stacked with billions and PhD.
Whatever they are trying to win, won’t be achieved by secrecy.
With the arm race of collecting information to train models and keep them fresh, we’re going to see web scraping going out of the gray zone into the firmly black camp.
Can’t be sustainable.
With the arm race of collecting information to train models and keep them fresh, we’re going to see web scraping going out of the gray zone into the firmly black camp.
Can’t be sustainable.
Smaller LLM just disregard the illogical connections.
Some “learning” is happening.
Smaller LLM just disregard the illogical connections.
Some “learning” is happening.
But, you can check this paper for a general rule of thumb!
academic.oup.com/bioinformati...
Kind of an old one, but it gives some general guidance. (Watch out it assumes all features are important.)
But, you can check this paper for a general rule of thumb!
academic.oup.com/bioinformati...
Kind of an old one, but it gives some general guidance. (Watch out it assumes all features are important.)