Ben Burtenshaw
banner
benburtenshaw.bsky.social
Ben Burtenshaw
@benburtenshaw.bsky.social
Building tools for AI datasets. 😽
Looking in AI datasets. 🙀
Sharing clean open AI datasets. 😻

at https://bsky.app/profile/hf.co
I've put together some of the handier tools for building courses and educational material on the @huggingface hub.

These should bootstrap you projects with quizzes, friendly sized model, usefule datasets, and informative spaces.

Let me know if you use or need more.

https://buff.ly/42qyanw
January 28, 2025 at 7:32 AM
Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers.
January 24, 2025 at 11:08 AM
Deepseek just dropped a frontier reasoning model on the hub. It's 685 billion parameters of bleeding edge performance on COMPLEX tasks.

Who's considering this for synthetic datasets, distillation, or pruning?
January 20, 2025 at 8:38 AM
Agents need tools and the Hugging Face hub is full of them. You can use Gradio spaces on the hub as agent tools. I created a short list that I tried out and made. Here's an overview

🧵
January 15, 2025 at 10:00 AM
Free course on Agents by Hugging Face. We just added a chapter to smol course on agents. Naturally, using smolagents! The course cover these topics:

- Code agents
- Retrieval agents
- Custom functional

If you're building agent applications, this course should help.
January 13, 2025 at 10:00 AM
People are flexing their end of year stats, so I made this app to show @hf.co hub stats in a tidy design!

Thanks @jfcalvo.hf.co and @ameeelie.bsky.social for the feature!
December 19, 2024 at 1:28 PM
Vision Language Models have been the topic of the smol course this week. Which is a free and open course!

If you're working with VLMs and/or up-skilling, check it out. There's loads of useful material, discussion, and feedback in the repo.

🧵
December 19, 2024 at 10:00 AM
Lack of data is often the first challenge when you're building real AI systems for specific languages or use cases. If you're dealing with this, try out this dataset generator.

It generates a custom dataset for your use case, which you can use to train a model for classification or chat.
December 16, 2024 at 4:00 PM
If you're using LLMs for use cases, this free course just got real! Smol course now has 4 chapters, and the most important was just released.

🌎 Chapter 4 shows you how to evaluate models on custom use cases.

Check out smol course here: https://buff.ly/3ZCMKX2
December 13, 2024 at 10:06 AM
December 11, 2024 at 11:00 AM
came across this example in agent-as-a-judge from Meta. It uses agent-as-a-judge to evaluate the effectiveness of a DevAI app.

- It's based on an open dataset.
- It's more accurate than LLM as a judge
- It explains its evaluation based on preferences, and requirements.

https://buff.ly/49tN6CQ
December 11, 2024 at 11:00 AM
- ongoing translation projects in Korean, Vietnamese, Portuguese, and Spanish
- 3 chapters are ready for students. On: instruction tuning, preference alignment, and parameter efficient fine tuning
- 3 chapters in progress on evaluation, vision language models, and synthetic data.
December 10, 2024 at 10:15 AM
Quick update from week 1 of smol course. The community is taking the driving seat and using the material for their own projects. If you want to do the same, join in!

🧵
December 10, 2024 at 10:15 AM
FishSpeech v1.5 is a multilingual, zero-shot instant voice cloning, low-latency, open text to speech model:

https://buff.ly/3Bs97oR
December 7, 2024 at 10:00 AM
Google DeepMind release a model to generate action-controllable, playable 3D environments.

https://buff.ly/3BczBe1

@jparkerholder.bsky.social @rockt.ai
December 7, 2024 at 10:00 AM
The policy scholars at @hf.co released a practical guide on the EU AI act for devs: https://buff.ly/3ZE6EkM
December 7, 2024 at 10:00 AM
Google dropped PaliGemma2, a vision language model that's perfect for fine-tuning:

https://buff.ly/49qC9Su

@keysers.bsky.social @andreaspsteiner.bsky.social
December 7, 2024 at 10:00 AM
1/n released the smol course on finetuning and aligning llms https://buff.ly/41mTz09
December 7, 2024 at 10:00 AM
⭐️ The stats are the wildest. 1364 github stars in a day. Folk really want their own models.
December 4, 2024 at 9:00 AM
🤗 I'm inspired by the folk contributing. They already know this stuff and just want to lend a hand to others by improving the course.

👩‍🎓 We have 325 students, 7 submissions, and 12 improvements.

Come and join in here:

https://buff.ly/4f0HkJV
December 4, 2024 at 9:00 AM
smol course Day 1 ✅. I learnt that people are hungry for models they can own.

📚 Material focused on instruction tuning. Split into chat templates and supervised fine tuning. There's more to this subject than this, but we're keeping things smol.

⏩ If you haven't already, try out module 1!

🧵
December 4, 2024 at 9:00 AM
Folk asked about the engagement time, so I posted it here

github.com/huggingface/...
December 3, 2024 at 11:46 AM
For anyone interested in fine-tuning or aligning LLMs, I’m running this free and open course called smol course. It’s not a big deal, it’s just smol.

🧵>>
December 3, 2024 at 9:21 AM
I wanted to share an easy to use notebook, that folk could use to explore and analyse LLM benchmark results in lighteval. Here it is:

🧵>>
November 28, 2024 at 11:18 AM
Install the dependencies and application, then run the application.

😊 you have your very own local vision language model.
November 27, 2024 at 11:02 AM