Daniel Vila
dvilasuero.hf.co
Daniel Vila
@dvilasuero.hf.co
Everything datasets and human feedback for AI at Hugging Face.

Prev: co-founder and CEO of Argilla (acquired by Hugging Face)
💥 Ending 2024: A full data annotation journey on the Hugging Face Hub—from raw data to training-ready datasets!

With Argilla 2.6.0, push your data to the Hub from the UI

Let’s make 2025 the year anyone can build more transparent and accountable AI—no coding or model skills needed.
December 20, 2024 at 11:14 AM
Help shape the future of multilingual Open Source AI!

Join the FineWeb 2 Community Annotation Sprint to create an open training dataset with full transparency and human validation in many languages.

Review datasets in your language and help identify the best sources for training.
December 10, 2024 at 2:12 PM
Announcing Global-MMLU - an improved MMLU Open dataset with evaluation coverage across 42 languages.

The result of months of work with the goal of advancing Multilingual LLM evaluation.

Built together with the community and amazing collaborators at Cohere4AI, MILA, MIT, and many more.
December 6, 2024 at 8:59 AM
Super excited to launch the Open Images Preferences @huggingface.bsky.social community sprint

Have fun browsing images generated with the latest OSS models while contributing to the future of Open Source AI

🧵
November 26, 2024 at 12:34 PM
Let's make AI more inclusive.

At @huggingface.bsky.social we'll launch a huge community sprint soon to build high-quality training datasets for many languages.

We're looking for Language Leads to help with outreach.

Find your language and nominate yourself:
forms.gle/iAJVauUQ3FN8...
November 26, 2024 at 6:29 AM
"RL with Verifiable Rewards" datasets for math and instruction-following
November 22, 2024 at 10:22 AM
On/off policy preferences
November 22, 2024 at 10:22 AM
Persona-driven datasets for math, code, instruction-following (ifeval)
November 22, 2024 at 10:22 AM
Something big is coming @huggingface.bsky.social — next Monday.

Join the community and contribute to advancing image generation.
November 21, 2024 at 8:14 PM
Colab for using Qwen-Coder to:

- Build a synthetic code dataset
- Run human eval

colab.research.google.com/drive/1qh7VW...
November 19, 2024 at 4:02 PM