Build Attention-Killers AI (RWKV) from scratch @ http://wiki.rwkv.com
Also built uilicious & GPU.js (http://gpu.rocks)
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
- breaking up & being friend
- dating friend
- friend groups being chill & supportive, with all of it (getting together, or breaking up)
Starting romantic relations, without knowing your partner as a person is weird to me
bias: I married a friend
- breaking up & being friend
- dating friend
- friend groups being chill & supportive, with all of it (getting together, or breaking up)
Starting romantic relations, without knowing your partner as a person is weird to me
bias: I married a friend
32-72B is all we need for human level of AGI 🤖
Anything higher is just us being inefficient in architecture / code / etc
#NeurIPS2024
32-72B is all we need for human level of AGI 🤖
Anything higher is just us being inefficient in architecture / code / etc
#NeurIPS2024
We will be able to cover the vast majority of enterprise AI workloads without QKV attention
Let that sink in 😎
We release QRWKV6-32B-Instruct preview, a model converted from Qwen-32B instruct, trained for several hours on 2 MI300 nodes.
Surpassing all previous known open linear models (StateSpace, Hybrid, etc)
Unlocking 1000x+ lower inference cost
We will be able to cover the vast majority of enterprise AI workloads without QKV attention
Let that sink in 😎
We release QRWKV6-32B-Instruct preview, a model converted from Qwen-32B instruct, trained for several hours on 2 MI300 nodes.
Surpassing all previous known open linear models (StateSpace, Hybrid, etc)
Unlocking 1000x+ lower inference cost
We release QRWKV6-32B-Instruct preview, a model converted from Qwen-32B instruct, trained for several hours on 2 MI300 nodes.
Surpassing all previous known open linear models (StateSpace, Hybrid, etc)
Unlocking 1000x+ lower inference cost
Matching transformer level performance despite the lack of "Quadratic Attention", using RWKV Attention instead
Proving Attention is **not** all you need
Matching transformer level performance despite the lack of "Quadratic Attention", using RWKV Attention instead
Proving Attention is **not** all you need
Are you in SF? Wish to nerd out on AI & ML?
Join our holiday get together + community potluck + discussion of the best new AI research with me,
@swyx.io @eraqian @dylan522p @vibhuuuus 😎
At 4pm today
lu.ma/25mwbwcm
Are you in SF? Wish to nerd out on AI & ML?
Join our holiday get together + community potluck + discussion of the best new AI research with me,
@swyx.io @eraqian @dylan522p @vibhuuuus 😎
At 4pm today
lu.ma/25mwbwcm
Best I keep that frozen for awhile!
I’ll just bee quiet about it!
Best I keep that one preserved
Best I keep that frozen for awhile!
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
Running a bunch of test prompts specific to what your company does through a new model feels like it should be pretty low hanging fruit
Running a bunch of test prompts specific to what your company does through a new model feels like it should be pretty low hanging fruit
I'll be at a couple things:
- Wed 1-2pm: talking Transformer killers with
@picocreator.bsky.social at @swyx.io @latentspacepod.bsky.social live!
- Wed 11am: RedPajama poster (spotlight) with
Maurice Weber
1/2
I'll be at a couple things:
- Wed 1-2pm: talking Transformer killers with
@picocreator.bsky.social at @swyx.io @latentspacepod.bsky.social live!
- Wed 11am: RedPajama poster (spotlight) with
Maurice Weber
1/2
Me as South East Asian: oooo… snow ☃️
My Canadian friends: that’s barely any snow ❄️
The true Canadian experience needs to have at least knee high snow I guess 🤣
Me as South East Asian: oooo… snow ☃️
My Canadian friends: that’s barely any snow ❄️
The true Canadian experience needs to have at least knee high snow I guess 🤣
To chill and talk, before flying over to Canada 🇨🇦
lu.ma/25mwbwcm
To chill and talk, before flying over to Canada 🇨🇦
lu.ma/25mwbwcm
Seems like pro (~$9/month) is limited to 1TB?
Seems like pro (~$9/month) is limited to 1TB?
( the circle of internet life )
( the circle of internet life )
The biggest damage big AI did to western AI - was to convince everyone that, only they can build AI with more funding - and that no one else can
Denying competition and progress locally
The biggest damage big AI did to western AI - was to convince everyone that, only they can build AI with more funding - and that no one else can
Denying competition and progress locally
Give them all a follow folks 👆
P.S - A shoutout to @presidentlin.bsky.social for helping as always
Give them all a follow folks 👆
P.S - A shoutout to @presidentlin.bsky.social for helping as always
Here's how to start your own paper club
eugeneyan.com/writing/pape...
Here's how to start your own paper club
eugeneyan.com/writing/pape...
Turning this into a perpetual thread.... for every time i hear
"LLM-as-a-judge" or something "similar" being discussed
Today Oct 30 : Heavybit devguild AI Summit 3 event
Turning this into a perpetual thread.... for every time i hear
"LLM-as-a-judge" or something "similar" being discussed
Today Oct 30 : Heavybit devguild AI Summit 3 event
And I guess I should start cross posting here as well
And I guess I should start cross posting here as well