hailey schoelkopf
hails.computer
hailey schoelkopf
@hails.computer
so academic twitter is like actually-actually migrating this time huh?

i still don’t know if i have it in me to actively use another social network yet 😖
November 19, 2024 at 3:33 PM
Reposted by hailey schoelkopf
We released Dolma, the dataset for OLMo, AI2's LLM. It's 3+ trillion tokens. We hope it will help w study of language models!

Available on HuggingFace w/ ImpACT license huggingface.co/datasets/allenai/dolma

Overview+datasheet blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64
Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining
We released Dolma, OLMo’s pretraining dataset. Dolma open dataset of 3 trillion tokens. Available on HuggingFace under the ImpACT license
blog.allenai.org
August 18, 2023 at 10:21 PM