piomonti.bsky.social
@piomonti.bsky.social
Reposted
OH: You fail 100% of the coups you don’t attempt.
December 3, 2024 at 8:07 PM
Reposted
Be the internal server error you wish to see in the world
November 25, 2024 at 8:15 PM
Reposted
1/5 Earlier this year, I joined @datologyai.com to give wings to the data research I had been doing in academia. Today, I am absolutely thrilled to share what we’ve been working on!

Techvember Ep 2: How we made the #1 LLM Pre-training Data Recipe.

Blog: 👉 tinyurl.com/best-llm-data 🧵
November 25, 2024 at 6:43 PM
Reposted
Tired: Bringing up politics at Thanksgiving

Wired: Bringing up @datologyai.com’s new text curation results at Thanksgiving

That’s right, we applied our data curation pipeline to text pretraining data and the results are hot enough to roast a 🦃
🧵
November 25, 2024 at 5:49 PM
Reposted
I am excited about the release of our results on web-scale text data curation @datologyai.com. Our curation pipeline transforms the RedPajama V1 dataset into the DAIT dataset which outperforms the best publicly-available pretraining datasets for training LLMs better, faster, smaller.
Tired: Bringing up politics at Thanksgiving

Wired: Bringing up @datologyai.com’s new text curation results at Thanksgiving

That’s right, we applied our data curation pipeline to text pretraining data and the results are hot enough to roast a 🦃
🧵
November 25, 2024 at 7:46 PM
Reposted
🧵We’ve spent the last few months at @datologyai.bsky.social
building a state-of-the-art data curation pipeline and I’m SO excited to share our first results: we curated image-text pretraining data and massively improved CLIP model quality, training speed, and inference efficiency 🔥🔥🔥
November 14, 2024 at 5:16 PM