LLM360
banner
llm360.bsky.social
LLM360
@llm360.bsky.social
Working on fully open-source LLMs and training data. We believe in community-owned AI.

https://www.llm360.ai
We've added you to the list!
December 2, 2024 at 7:31 AM
We've added you to the list!
November 25, 2024 at 9:30 AM
Can we join your list?
November 22, 2024 at 1:28 AM
We've added you to the list!
November 22, 2024 at 1:27 AM
Great, yes, added!
November 22, 2024 at 1:26 AM
Thanks Stella! We've added eleuther to the list.
November 21, 2024 at 2:15 AM
Thanks! We've added you to the list.
November 21, 2024 at 2:15 AM
Thank you!
November 19, 2024 at 11:03 PM
🌍🌎The global deduplication process was hairy 🙈 - and we want to share every detail.

The TxT360 dedup pipeline can be recreated and used for other datasets. We include our tips and tricks in a tell-all write up in the release blog:
llm360-txt360.hf.space
huggingface.co/spaces/LLM36...
TxT360: Trillion Extracted Text - a Hugging Face Space by LLM360
Discover amazing ML apps made by the community
huggingface.co
November 19, 2024 at 10:51 PM
Building on FineWeb’s global deduplication findings, we introduce a strategic upsampling recipe which outperforms FineWeb using TxT360. Full details are in the Upsampling Experiment section of the release blog.
November 19, 2024 at 10:51 PM
🪟🛠️LLM360 is committed to making open source AI accessible, transparent, and reproducible.

High-quality data is the first step toward better open source models...and we are excited to join the party contributing the first globally deduplicated dataset containing 5.7T tokens!
November 19, 2024 at 10:51 PM
Can we join?
November 19, 2024 at 10:40 PM