benjamin
banner
bclavie.bsky.social
benjamin
@bclavie.bsky.social
doing ML stuff at answer.ai / fast.ai
🇯🇵-based 🇫🇷man
Reposted by benjamin
I'll get straight to the point.

We trained 2 new models. Like BERT, but modern. ModernBERT.

Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.

It's much faster, more accurate, longer context, and more useful. 🧵
December 19, 2024 at 4:45 PM
I wonder if some kind of model could fill this in...
December 11, 2024 at 11:50 AM
Reposted by benjamin
Thank you @bsky.app team for correcting the mistake. Glad to be back!
November 28, 2024 at 8:00 PM
Reposted by benjamin
people on this platform will take your words out of context, twist, not mention your correction, because they just want to hate on what you work on, and insult you comfortably.

I'll keep posting here about my work but will not be interacting with anyone who wants to bash on my company.
November 28, 2024 at 10:22 AM
Reposted by benjamin
i exclusively consent to my tweets being used for training neural networks. if you are not a neural network, stop reading this immediately
November 28, 2024 at 2:59 AM
This might sound obvious, but bullying and threatening people doing perfectly legal things because you morally don't agree with them is wrong.

People stifling any serious discussion by doing this, albeit for another set of morals, is actually the exact reason that made a lot of people migrate here.
A librarian that previously worked at the British Library created a relatively small dataset of bsky posts, hundreds of times smaller than previous researchers, to help folks create toxicity filters and stuff.

So people bullied him & posted death threats.

He took it down.

Nice one, folks.
November 28, 2024 at 5:39 AM
Reposted by benjamin
Some days I really like this place, and then there are others in which there's a level of puritanical fervour that permeates a lot of public discourse that I find off-putting. Some of the over the top hateful responses wouldn't be out of place in the Hellsite.
November 28, 2024 at 5:16 AM
Reposted by benjamin
We should make sure that only really big companies can afford to pay really big copyright holders to access the data needed to do stuff with AI, and keep everyone else out.

Wouldn’t that be just super?
November 28, 2024 at 5:04 AM
Reposted by benjamin
I'm disheartened by how toxic and violent some responses were here.

There was a mistake, a quick follow up to mitigate and an apology. I worked with Daniel for years and is one of the persons most preoccupied with ethical implications of AI. Some replies are Reddit-toxic level. We need empathy.
I've removed the Bluesky data from the repo. While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake.
First dataset for the new @huggingface.bsky.social @bsky.app community organisation: one-million-bluesky-posts 🦋

📊 1M public posts from Bluesky's firehose API
🔍 Includes text, metadata, and language predictions
🔬 Perfect to experiment with using ML for Bluesky 🤗

huggingface.co/datasets/blu...
November 27, 2024 at 11:09 AM
Reposted by benjamin
I'm a TA for the new fast.ai course which starts in less than 30 minutes, and which sold out in <48 hours. It's so cool to see it all coming together
fast.ai—Making neural nets uncool again – fast.ai
fast.ai
November 26, 2024 at 10:35 PM
Reposted by benjamin
OLMo 2 is out 🥳 7B and 13B trained on 5T tokens, and meticulousy instruction tuned using Tulu 3 recipe.

Simply the best fully open models yet.

Really proud of the work & the amazing team at
@ai2.bsky.social
November 26, 2024 at 9:12 PM
Reposted by benjamin
I noticed a lot of starter packs skewed towards faculty/industry, so I made one of just NLP & ML students: go.bsky.app/vju2ux

Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!
November 23, 2024 at 7:54 PM
Reposted by benjamin
Creating a 🦋 starter pack for people working in IR/RAG: go.bsky.app/88ULgwY

I can’t seem to find everyone though, help definitely appreciated to fill this out (DM or comment)!
November 23, 2024 at 9:19 PM
Reposted by benjamin
Here is a list of ML OSS & Open Source / Science enthusiasts I found on Bluesky 🦋

go.bsky.app/8MFcfXd

Let me know if you find such people here!

I'm still new here and probably the list misses many must-add people, so let's built it together💪
November 21, 2024 at 5:19 AM
Reposted by benjamin
Mat is not on 🦋—posting on his behalf!

It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR.

We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?
November 20, 2024 at 7:47 PM
It amuses me to think that there’s someone out there convinced “the other app” is Threads and is sitting there like
patrick star from spongebob squarepants is standing on a sandy beach next to a black line .
ALT: patrick star from spongebob squarepants is standing on a sandy beach next to a black line .
media.tenor.com
November 20, 2024 at 10:17 AM
Reposted by benjamin
If you're an NLP researcher and haven't made it into either Starter Pack yet, please let me know! We're over halfway full at this point 😧

go.bsky.app/JgneRQk
November 18, 2024 at 7:45 AM
Reposted by benjamin
Me: not yet fully used to mentally mapping the Yen so copy-pasting a subscription amount to get a conversion from Google's advanced AI parsing

Google: don't worry fam I know exactly the unit you're looking for
November 18, 2024 at 3:45 AM
Me: not yet fully used to mentally mapping the Yen so copy-pasting a subscription amount to get a conversion from Google's advanced AI parsing

Google: don't worry fam I know exactly the unit you're looking for
November 18, 2024 at 3:45 AM
Reposted by benjamin
New here? Interested in AI/ML? Check out these great starter packs!

AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS

You can also search all starter packs here: blueskydirectory.com/starter-pack...
November 9, 2024 at 9:13 AM
Reposted by benjamin
I made this starter pack for AI dataset folk:

go.bsky.app/JxqKW4Q

Let me know if you want in!
November 17, 2024 at 7:09 AM
So we're doing Bluesky again now...

Any recommended way to catch up on all the ML/IR/DL/Memeposters from the other place?
November 17, 2024 at 8:22 AM