Ben Burtenshaw
banner
benburtenshaw.bsky.social
Ben Burtenshaw
@benburtenshaw.bsky.social
Building tools for AI datasets. 😽
Looking in AI datasets. 🙀
Sharing clean open AI datasets. 😻

at https://bsky.app/profile/hf.co
Pinned
For anyone interested in fine-tuning or aligning LLMs, I’m running this free and open course called smol course. It’s not a big deal, it’s just smol.

🧵>>
Reposted by Ben Burtenshaw
🤖 As AI-generated content is shared in movies/TV/across the web, there's one simple low-hanging fruit 🍇 to help know what's real: Visible watermarks. With others @hf.co, I've made sure it's trivially easy to add this disclosure to images, video, chatbot text. See how:
huggingface.co/blog/waterma...
September 16, 2025 at 4:29 PM
Reposted by Ben Burtenshaw
I'm writing an article series about creating tensors from scratch in Rust. #tensors #machine-learning #ml #ai

huggingface.co/blog/KeighBe...
Building Tensors From Scratch in Rust: Part 1, Core Structure and Indexing
A Blog post by Kyle Birnbaum on Hugging Face
huggingface.co
June 12, 2025 at 11:56 PM
Reposted by Ben Burtenshaw
AI doesn’t get your culture?❌ butchers your language? 😤
With FeeL – you can fix that🛠️🌍

💬 Talk to AI in your language
✏️ Correct its mistakes
👁‍🗨 Watch it improve
The more we use it, the smarter it gets for everyone!

👉 Try it now: huggingface.co/spaces/feel-...

👶🤖📈
#ai #genAI #llm
Feel - a Hugging Face Space by feel-fl
Discover amazing ML apps made by the community
huggingface.co
March 26, 2025 at 11:23 AM
Reposted by Ben Burtenshaw
How should AI tools be designed to support rather than replace workers?

At the Reshaping Work conference, I led a roundtable exploring AI’s impact on labor. We published a blogpost on our key takeaways on responsible AI and the future of work w/ Franco Bastida
🔗 www.rsm.nl/discovery/20...
🧵👇
Start-Up Approaches to Responsible AI: Worker-Centric InnovationRotterdam school of Management, Erasmus University logoRotterdam school of Management, Erasmus University compact logo
Explore how start-ups are reshaping AI development through transparency, worker inclusivity, and ethical approaches that prioritise human augmentation over replacement.
www.rsm.nl
February 12, 2025 at 3:12 PM
I've put together some of the handier tools for building courses and educational material on the @huggingface hub.

These should bootstrap you projects with quizzes, friendly sized model, usefule datasets, and informative spaces.

Let me know if you use or need more.

https://buff.ly/42qyanw
January 28, 2025 at 7:32 AM
Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

Here's a thread on it all:
January 27, 2025 at 10:00 AM
Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers.
January 24, 2025 at 11:08 AM
If you need long context for RAG, tool use, agents, or just because, Nvidia released a new library to make it super simple.

TLDR: You can get 128k context at 50% less memory 🐳

Here's a blog post on everything:
Mastering Long Contexts in LLMs with KVPress
A Blog post by NVIDIA on Hugging Face
buff.ly
January 23, 2025 at 10:00 AM
Reposted by Ben Burtenshaw
What happened yesterday in the Chinese AI community? 🚀
huggingface.co/posts/AdinaY...
January 21, 2025 at 11:51 AM
Deepseek just dropped a frontier reasoning model on the hub. It's 685 billion parameters of bleeding edge performance on COMPLEX tasks.

Who's considering this for synthetic datasets, distillation, or pruning?
January 20, 2025 at 8:38 AM
Playing around with AI agents, and I reckon Gradio spaces on the hub make the perfect tools.

- super easy to connect your agents to a bunch of useful tools and apps.
- find a Space you like on Hugging Face Hub or make your own with Gradio.
- link it up with smolagents.

🧵

Gradio And Llm Agents
A Step-by-Step Gradio Tutorial
www.gradio.app
January 17, 2025 at 10:00 AM
Reposted by Ben Burtenshaw
We’re launching a FREE course on LLM Agents 🥳

📖 Learn what Agents are
🕵️ Build your own Agents using the latest libraries and tools.
🎓 Earn a certificate of completion to showcase your achievement.

Enroll now 👉 huggingface.us17.list-manage.com/subscribe?u=...
January 15, 2025 at 3:23 PM
Agents need tools and the Hugging Face hub is full of them. You can use Gradio spaces on the hub as agent tools. I created a short list that I tried out and made. Here's an overview

🧵
January 15, 2025 at 10:00 AM
Great deep dive blog post on Agents, covering all the fundamentals from the ground up.

@mmitchell.bsky.social @sashamtl.bsky.social @giadapistilli.com @evijit.io

huggingface.co/blog/ethics-...
AI Agents Are Here. What Now?
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
January 13, 2025 at 7:28 PM
Free course on Agents by Hugging Face. We just added a chapter to smol course on agents. Naturally, using smolagents! The course cover these topics:

- Code agents
- Retrieval agents
- Custom functional

If you're building agent applications, this course should help.
January 13, 2025 at 10:00 AM
If you're looking for real talk from experience, check out this blogpost on emissions from generative ai models:

huggingface.co/blog/leaderb...
CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
January 10, 2025 at 8:10 AM
What does o3 mean for small open source models on real domains? I've been asking myself this question a lot the last week, and I think it sets up an interesting path forward.
OpenAI, is calling o3 a new paradigm. If that's true, then these two developments illustrate how that paradigm plays out:

🧵
December 27, 2024 at 11:00 AM
Synthetic Datasets are the focus of smol course this week! Synthetic datasets supercharge applying models to your own use case, because you can do stuff like this:

🧵 > >
December 23, 2024 at 4:00 PM
People are flexing their end of year stats, so I made this app to show @hf.co hub stats in a tidy design!

Thanks @jfcalvo.hf.co and @ameeelie.bsky.social for the feature!
December 19, 2024 at 1:28 PM
Argilla annotation is free from code! You can now build and export your annotation projects, right from the comfort of your UI!
🚀 Argilla v2.6.0 is here! 🎉

Let me show you how EASY it is to export your annotated datasets from Argilla to the Hugging Face Hub. 🤩

Take a look to this quick demo 👇

💁‍♂️ More info about the release at github.com/argilla-io/a...

#AI #MachineLearning #OpenSource #DataScience #HuggingFace #Argilla
December 19, 2024 at 12:44 PM
Vision Language Models have been the topic of the smol course this week. Which is a free and open course!

If you're working with VLMs and/or up-skilling, check it out. There's loads of useful material, discussion, and feedback in the repo.

🧵
December 19, 2024 at 10:00 AM
Reposted by Ben Burtenshaw
🙅‍♀️ No-code end-to-end example to train your model

1️⃣ Use the Synthetic Data Generator to create your custom dataset

2️⃣ Use AutoTrain to use the generated dataset and train your model

Check it here: huggingface.co/blog/synthet...
December 18, 2024 at 11:28 AM
Small open models are on a roll! Hugging Face just released research to get a 3 billion parameter Llama model to outperforms it's 70 billion parameter variant.

🧵
December 17, 2024 at 8:27 AM
Lack of data is often the first challenge when you're building real AI systems for specific languages or use cases. If you're dealing with this, try out this dataset generator.

It generates a custom dataset for your use case, which you can use to train a model for classification or chat.
December 16, 2024 at 4:00 PM
[SATURDAY POST] In case you were disorientated by growing extra limbs or defying physics with your movements, here's a round up of what happened in AI this week.

p.s. sora jokey

🧵
December 14, 2024 at 11:00 AM