Lintang Sutawika
banner
sutawika.com
Lintang Sutawika
@sutawika.com
PhD @ltiatcmu.bsky.social
previously @eleutherai.bsky.social

🌐 lintang.sutawika.com
Reposted by Lintang Sutawika
Can you train a performant language model using only openly licensed text?

We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
June 6, 2025 at 7:19 PM
Maybe. But probably more likely, they're using QwQ or Deepseek.
December 2, 2024 at 4:21 AM
Transformers demonstrated how to attend an entire sequence length which at the time was different to many approaches like LSTM that processed tokens sequentially. The attention span across the whole sequence does parallel the aliens from Arrival.
The problem with that anecdote is: attention 2015, Arrival 2016, Transformers 2017. Original source might clarify what the exact contribution of Arrival was?

But it’s a nice anecdote.
interesting connection, but the transformer paper didn’t invent attention? arxiv.org/abs/1409.0473
December 1, 2024 at 4:13 PM
Attended 2 different lectures (1 class and 1 invited guest lecture) with the similar topic of inference-time scaling. Maybe the matrix is trying to tell me something.
November 22, 2024 at 2:14 AM
Lectures in #nlp I see that use Taylor Swift to illustrate concepts.
Every time I see someone post this image it goes viral
November 21, 2024 at 8:44 PM
Reposted by Lintang Sutawika
@eleutherai.bsky.social is our official account. Will be posting here and on Twitter from now on.
November 20, 2024 at 2:18 PM
LTI PhDs seeking refuge in Bluesky
go.bsky.app/NhTwCVb
November 7, 2024 at 4:47 PM