joelniklaus.bsky.social
@joelniklaus.bsky.social
pleias just released 75B tokens of synthetic data upsampled from 50K vital Wikipedia articles!

Some thoughts below:
- Interesting that they use such a deep architecture for such small models (64 layers for 56M and 80 layers for 321M parameters)
November 11, 2025 at 3:59 PM
What does AGI actually mean? A who's who in AI spent 57 pages answering that.

TLDR: AGI is defined through ten measurable cognitive domains using psychometric theory.
November 10, 2025 at 3:56 PM
Need copyright-clean training data at scale? Check out the gold mine of the KL3M Data Project on the Hugging Face Hub! ALEA Institute provides 132+ million documents from 16 sources with substantial training resources:
November 8, 2025 at 3:01 PM
If you're interested in legal retrieval, check out the amazing Massive Legal Embedding Benchmark (MLEB) by Isaacus!

Very cool collection of retrieval datasets all available on the Hugging Face hub!

Great work by Umar Butler, Abdur-Rahman Butler, Adrian Lucas Malec!
November 7, 2025 at 4:02 PM
CourtListener supports semantic search now!

Apparently they implemented hybrid search using their own fine-tuned ModernBERT model publicly available on the Hugging Face hub!

Congrats to @michaeljaylissner and the Free Law Project for making this happen!
November 6, 2025 at 4:04 PM
On-policy distillation matches RL performance at 2-10% of compute cost.

RL gives sparse feedback and burns compute. Off-policy distillation is efficient but learns in the teacher's states, not the student's, causing compounding errors on long sequences.
November 5, 2025 at 3:56 PM
The correlation between number of reads and edits across Wikipedia articles is 0! This means there is a significant number of articles that are highly read but almost never edited.
November 4, 2025 at 3:57 PM
If you're exploring computational legal research or building legal AI systems, take a look at Jurisprudence on the Hugging Face Hub by Antoine Jeannot.
November 3, 2025 at 4:04 PM
Reasoning models excel at math but struggle with simple requests like word limits during thinking

TLDR: Models ignore user instructions while reasoning despite following them in final outputs.
November 2, 2025 at 3:04 PM
Cool long-context eval by Artificial Analysis!

AA-LCR is a set of 100 tough questions where you need to piece together answers from several real-world documents—sometimes really big ones—so you can’t just copy and paste the answers.
November 1, 2025 at 3:02 PM
Very cool work by researchers from Massachusetts Institute of Technology.
October 31, 2025 at 3:56 PM
Check out the Hugging Face inference endpoints, quick and simple inference for many great open models!
October 30, 2025 at 4:03 PM
Hugging Face just got promoted to the BigTech club 😉

Thanks to Gian Sbetta and Edouard Treccani for inviting me to a great first AI Builders event in Zurich this evening!

Had lots of great conversations with super interesting people!
October 29, 2025 at 9:14 PM
Impressive collection of specialized models and datasets for French taxation and legal documents by Louis Brulé Naudet:
October 28, 2025 at 3:58 PM
I just evaluated MiniMax M2 on GPQA-Diamond and LEXam-English in the "I don't know" setup.

TLDR: It is very strong on GPQA, especially for its size, but underperforms on LEXam.
October 27, 2025 at 3:56 PM
Just finished reading "The Ultra-Scale Playbook: Training LLMs on GPU Clusters". Great coverage of the important concepts with good explanations and nice interactive graphics!

Thanks Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf!
October 26, 2025 at 3:56 PM
LEXam Update: GPT-5 Takes the Top Spot

We're excited to share our latest LEXam evaluation results:
- GPT-5 claims the #1 position, outperforming Gemini 2.5 Pro and setting a new state-of-the-art for legal reasoning on LEXam!
October 23, 2025 at 3:01 PM
GPT-5 and Claude can ace GPQA Diamond, but LEXam (a legal reasoning benchmark) exposes a critical flaw: they'd rather be confidently wrong than admit uncertainty.

⚙️ The Setup
I evaluated ten frontier models on LEXam (English MC subset) using an "I don't know" (IDK) protocol.
October 22, 2025 at 3:04 PM
What's special about October 27th 2025?

Yoshua Bengio, the most-cited computer scientist in the world, is 1 week away from becoming the first ML researcher to hit 1 million citations! 🤯

At his current rate of 366 citations/day, he'll reach this unprecedented milestone around October 27th 🎯
October 20, 2025 at 3:04 PM
Very cool and detailed Stanford University and Carnegie Mellon University study on sycophancy: "Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence"

Sycophancy, the phenomenon of excessively agreeing with or flattering users, is a pervasive issue in current LLMs.

Findings:
October 19, 2025 at 3:00 PM
Context rot is a major problem as tasks grow more complex and context windows expand; this issue is particularly acute for lawyers, who must process lengthy, intricate documents and are especially vulnerable to the loss or distortion of critical information.
October 16, 2025 at 2:58 PM
Very excited to announce that I am officially the second most cited researcher working on #good_shit according to Google Scholar trailing behind Lucas Beyer by only 99,062 citations 🎉😂
October 15, 2025 at 3:00 PM
Cool tech report by Chroma: "Context Rot: How Increasing Input Tokens Impacts LLM Performance"

Main Findings:
- LLM performance drops as input length grows, even in simple tasks.
- Semantic ambiguity and distractors accelerate this decline.
October 14, 2025 at 2:58 PM
Visiting the Hugging Face HQ in Paris last week was a pleasure.

Many thanks to the amazing LeRobot team for showing me around and letting me play. 😉

Check out their awesome work on the hub!
October 8, 2025 at 3:03 PM
Just read this nice blog post "The Second Half".
October 7, 2025 at 2:57 PM