Lightnews — Scholar-powered news

benjamin

@bclavie.bsky.social

I know a lot of people are working on making ModernBERT-based embedding models, but in the meantime, if you’d like to play around with it (no better way to learn than practice), it’s plug&play with Sentence Transformers www.sbert.net and we have examples on the repo

SentenceTransformers Documentation — Sentence Transformers documentation

www.sbert.net

December 22, 2024 at 1:11 AM

benjamin

@bclavie.bsky.social

Hey! As Jeremy replied, this is fully expected, encoder-models aren’t expected to produce well-calibrated semantically similar scores out of the box, because it’s very far from the training task for the base model!

However, they fine tune really well into embedding models that are good at this 1/2

December 22, 2024 at 1:10 AM

benjamin

@bclavie.bsky.social

There was one time my flight from Geneva got cancelled and I got a replacement one from Lyon. Still one of my most surreal experiences.

December 9, 2024 at 9:24 AM

benjamin

@bclavie.bsky.social

Won't be at NeurIPS but I'll be at ICLR in April, in case you're planning on being there 😄

December 8, 2024 at 11:59 AM

benjamin

@bclavie.bsky.social

Please do go on about the coffee. Is it a make-you-an-espresso-as-required kind of deal or a big pot? Perhaps a lovingly made 1L chemex?

December 2, 2024 at 12:30 AM

benjamin

@bclavie.bsky.social

I can understand this yeah. I’m generally open to discussion but I’ve seen enough unsavoury behaviour & DMs in the past couple days to want to dial it down a teensy bit at the moment sadly.

November 28, 2024 at 2:55 PM

benjamin

@bclavie.bsky.social

Jokes aside, it does make me kinda sad. ML Bluesky has a lot of the vibes of early twitter and interesting discussions, but seeing so many of the death threats posters unbanned while someone was banned for *posting a link to a dataset* is a really bad sign :/

November 28, 2024 at 1:04 PM

benjamin

@bclavie.bsky.social

LLM2Vec is also a nice approach for this -- only difference is you'd FT for classification rather than retrieval at the end github.com/McGill-NLP/l...

GitHub - McGill-NLP/llm2vec: Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders' - McGill-NLP/llm2vec

github.com

November 28, 2024 at 1:03 PM

benjamin

@bclavie.bsky.social

It’s only hate if it comes from
the Champagne region of X, otherwise it’s just sparkling outrage (I think?)

November 28, 2024 at 10:25 AM

benjamin

@bclavie.bsky.social

(ChromaDB is good too, but IMO it's targeting a different/less AI tinkery audience)

November 28, 2024 at 10:06 AM

benjamin

@bclavie.bsky.social

(they do not employ me, nor pay me in any way, I'm just out there doing unpaid advertising)

November 28, 2024 at 10:06 AM

benjamin

@bclavie.bsky.social

heartily recommend lancedb for local stuff where you don't want to fuss with things too much -- mostly sane default, has reranking and bm25 support so you can do two-step or hybrid search whenever needed, and the disk ANN is plenty for most people.

November 28, 2024 at 10:05 AM

benjamin

@bclavie.bsky.social

Note: you can still criticise the way the original dataset was built. Nothing's black and white. I understand why people are upset.
None of this implies there isn't something seriously wrong with sending death threats to someone because they *curated an open dataset from an open protocol*.

November 28, 2024 at 6:10 AM

benjamin

@bclavie.bsky.social

Data gathering on an open platform via an open protocol is only ethical if you're not told about it, silly.

November 28, 2024 at 5:09 AM

benjamin

@bclavie.bsky.social

It’s been absolutely horrible to watch this. Pure “it’s fine to insult, harass and threaten people as long as you are doing it for the right reason” energy.

At least blocklists help, I guess blocking toxicity on sight is the only way.

November 28, 2024 at 1:36 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news