Natalia
banner
nataliaelv.hf.co
Natalia
@nataliaelv.hf.co
Building Argilla @ Hugging Face 🤗. Linguist at heart. En ocasiones escribo en castellano.
Pinned
Hello everyone! 👋 Since this is growing quite a bit, I thought I'd introduce myself:

I'm Natalia, a computational linguist working at @huggingface.bsky.social as part of the team building Argilla.
New chapter in the Hugging Face NLP course! 🤗 🚀

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub. 

Any feedback for improvements welcome!
January 17, 2025 at 10:02 AM
Reposted by Natalia
🚀 Argilla v2.6.0 is here! 🎉

Let me show you how EASY it is to export your annotated datasets from Argilla to the Hugging Face Hub. 🤩

Take a look to this quick demo 👇

💁‍♂️ More info about the release at github.com/argilla-io/a...

#AI #MachineLearning #OpenSource #DataScience #HuggingFace #Argilla
December 19, 2024 at 12:39 PM
I'm taking a well-deserved break to celebrate Christmas 🎄 ☃️ but the FineWeb2 annotation sprint continues!

You can still contribute some annotations or start leading a language!
December 19, 2024 at 12:30 PM
If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!
FineWeb2 collaborative sprint: how to annotate
In this video you'll learn how you can go about annotating some records in the FineWeb2 collaborative annotation sprint launched by Hugging Face and Argilla....
buff.ly
December 17, 2024 at 10:09 AM
The FineWeb2 collaborative annotation sprint is also a way of keeping many languages alive. I talk about it in this LinkedIn post: https://buff.ly/49DghmN
December 13, 2024 at 1:02 PM
I've just contributed 142 examples to this dataset:

data-is-better-together-fineweb-c.hf.space/share-your-p...
lat - Lingua latina - Latin
Join and contribute to the dataset lat - Lingua latina - Latin
data-is-better-together-fineweb-c.hf.space
December 12, 2024 at 1:44 PM
Next week we're launching a collaborative annotation effort to build a big multilingual dataset, so you can have high-quality data in your language.

We are really close to getting leads for 100 languages! Can you help us cover the remaining 200?
December 3, 2024 at 12:45 PM
Reposted by Natalia
🙌 I just wanted to share a few thoughts about the latest Argilla release, 2.5.0, as it's a pretty big one!

Argilla now has full support for webhooks, which means you can do some pretty cool stuff, like model training on the fly as annotations are created. 🤯

#MachineLearning #NLP #DataLabeling
December 2, 2024 at 11:14 AM
This is what you get in Bluesky when your feeds are Linguistics and otters 🦦😍
November 26, 2024 at 8:46 PM
At @huggingface.bsky.social 🤗 we're preparing a collaborative annotation effort to build an open-source multilingual dataset.

If you'd like to get high-quality open data for your language, check if yours is listed in this form and sign up!
forms.gle/DHJdtvoSNxAA...
Language Lead sign-up
At Hugging Face 🤗, we're launching a big community initiative to improve LLM training for many languages. We're looking for Language Leads to help us cultivate specific languages during this initiativ...
forms.gle
November 26, 2024 at 1:16 PM
Reposted by Natalia
Periodic reminder: a lot of what makes AI "work" is exploited people doing the tasks, just hidden behind fancy websites.

It's good that a normie outlet like 60 Minutes is reporting on this.
November 23, 2024 at 1:44 AM
Reposted by Natalia
I created a collection with good models for dataset curation

- NSFW classifiers
- PII classifiers
- blazing fast embeddings by model2vec
- quality classifier
- educational value classifier
- domain classifier

Collection: huggingface.co/collections/...
Models for dataset curation - a Dataset-Tools Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
November 22, 2024 at 12:57 PM
Hello everyone! 👋 Since this is growing quite a bit, I thought I'd introduce myself:

I'm Natalia, a computational linguist working at @huggingface.bsky.social as part of the team building Argilla.
November 22, 2024 at 11:29 AM
Back to work after a week-long offsite in Martinique 🏝️ with my colleagues from @huggingface.bsky.social 🤗 !

I had time to relax, reflect, have fun and meet people who aren't just amazing at their work but also truly kind 💖

Can't wait for the next one!
November 18, 2024 at 10:19 AM
What's your strategy to save interesting posts and not forget about their existence?
November 15, 2024 at 11:48 AM
Reposted by Natalia
If you’re nerdy about language, there are lots of really interesting people in here!

go.bsky.app/UUM7Gcx
November 14, 2024 at 6:05 AM
Hello bsky! As a welcome post and inspired by the latest events in Valencia, I'd like to show you how I used the "Disaster Response Messages" dataset to upload a csv file into Argilla to quickly start annotating and identify pleas of help. No code needed.
www.loom.com/share/952c15...
Annotating and Curating Datasets in Argilla
Hello, I'm Natalia from the Arguello team at Hugging Face. Today, I'll guide you through annotating and curating datasets in Arguello without coding. I demonstrate using a disaster response dataset to...
www.loom.com
November 4, 2024 at 7:02 PM