banner
cojennin.bsky.social
@cojennin.bsky.social
Member of Technical Staff @AnthropicAI / prev @ MosaicML
Reposted
What’s the most effective way to add new domain knowledge into an open LLM? A new blog post from my team covers experiments we did at the beginning of the year to start answering this question. It starts, unsurprisingly, with sweeping your learning rate… www.databricks.com/blog/charact...
Characterizing Datasets and Building Better Models with Continued Pre-Training
www.databricks.com
November 25, 2024 at 11:29 PM
Reposted
When you fail to parse your data that’s a jsonl
November 22, 2024 at 12:47 AM
Reposted
Mat is not on 🦋—posting on his behalf!

It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR.

We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?
November 20, 2024 at 7:47 PM
Reposted
How many documents should you retrieve when using a reranker? The answer might surprise you!

Check out the excellent work from our intern Mathew on this important retrieval question. 👏
Mat is not on 🦋—posting on his behalf!

It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR.

We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?
November 20, 2024 at 8:15 PM
Reposted
I love the smell of providing executives with actionable insights in the morning
November 8, 2024 at 6:50 PM
"Son, we live in a world that has dashboards, and those dashboard have to be guarded by data engineers with Spark."
Time for some shitposting about dashboards, since that’s in the zeitgeist:

I call many years ago an eager IT manager telling me we should deprecate all of our traditional SSIS reports. Because all that info could be revealed in a dashboard. And dashboards allow “insight discovery” which is 🔥 …
October 29, 2024 at 8:11 PM
Reposted
Time for some shitposting about dashboards, since that’s in the zeitgeist:

I call many years ago an eager IT manager telling me we should deprecate all of our traditional SSIS reports. Because all that info could be revealed in a dashboard. And dashboards allow “insight discovery” which is 🔥 …
October 28, 2024 at 12:31 PM
Reposted
Any data people in New York want to grab bagels and talk about AI? We have a group that meets every other Thursday morning. I’d be happy to add people to the google group.
October 28, 2024 at 11:04 AM
text2sql is coming, look busy
October 29, 2024 at 6:12 PM
you either die making good model, or live long enough to see yourself make bad model go fast
October 29, 2024 at 11:47 AM
Reposted
It is surprisingly easy to accidentally dress as shaggy from Scooby-Doo
October 28, 2024 at 12:56 PM