Hynek Kydlíček
hynky.bsky.social
Hynek Kydlíček
@hynky.bsky.social
MLE @huggingface 🤗
Prague, CZ
🇪🇺 eu/acc
Going far beyond our original FineWeb, we've created something massive - 1,893 script-language pairs with almost 3 trillion words spanning 8TB of compressed files! 📚

It's fully open-source released under ODC-By 1.0, with fully reproducible code! 💻

huggingface.co/datasets/Hug...
HuggingFaceFW/fineweb-2 · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
December 8, 2024 at 9:27 AM