mlresearch42.bsky.social
@mlresearch42.bsky.social
This is a great example of research with immediate practical utility. Good work.
April 23, 2025 at 10:42 AM
What are you baking up there?
April 22, 2025 at 10:08 PM
Then whose take are you here for?
March 17, 2025 at 2:51 AM
(2) unsanctioned distillation of the leading models (GPT, Claude, etc.) by the fast seconds namely DeepSeek, Qwen, Grok, and (3) the "vanillization" of LLM outputs driven by RLHF-derived human preference convergence. That said, this is a good empirical question in need of research!
February 24, 2025 at 7:18 PM
Interesting observation. I would guess there are three factors at work (1) heavily overlapping pre-training datasets given how much of the scrapable internet is now consumed by LLMs, 1/2
February 24, 2025 at 7:15 PM
Good share. I have been looking for follow up on Anthropic's work in the spring.
January 10, 2025 at 8:28 PM
Awesome! Love to see these advances in encoder models - so much smaller and less compute-intense option for text classification tasks
December 31, 2024 at 3:56 PM
This is a great question. At a minimum, could have three+ experts vote on which questions are incorrect and then omit those from the dataset going forward. This is true of several benchmarks -- what are the barriers to correction? Is it intentionally leaving canaries for target leakage?
December 17, 2024 at 10:40 AM
While anecdotal, this is the type of side-by-side comparison I find very useful.
December 16, 2024 at 10:38 AM