Lightnews — Scholar-powered news

joelniklaus.bsky.social

@joelniklaus.bsky.social

Blog Post: pleias.fr/blog/blogsy...

Dataset: huggingface.co/datasets/Pl...

Large model: huggingface.co/PleIAs/Bagu...

Small model: huggingface.co/PleIAs/Monad

PleIAs/Monad · Hugging Face

huggingface.co

November 11, 2025 at 3:59 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- Cool to see this being done on the French supercomputer Jean Zay

November 11, 2025 at 3:59 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- They don't release any code and the method description is quite high-level only: For example I am curious how they finetuned their models and would love to learn more about how they set up their synthetic data pipeline. Looking forward to the full report.

November 11, 2025 at 3:59 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- They only evaluate on MMLU, GSM8K and HotPotQA. This seems cherry-picked, I wonder how their dataset performs on other standard benchmarks. They say that they basically skip pre-training and go straight to post-training.

November 11, 2025 at 3:59 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- Seems like a cool case study pushing really small models to the limits (30 MMLU for a 56M model)

November 11, 2025 at 3:59 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Paper: arxiv.org/abs/2510.18212

Gary Marcus' comment: garymarcus.substack.com/p/is-agi-th...

Is AGI the right goal for AI?

And also, what the heck is AGI anyway?

garymarcus.substack.com

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- Co-author Gary Marcus notes he doesn't agree with every detail but signed on to support better articulation of what AGI means. The equal 10% weighting across domains is one choice among many reasonable configurations, though the paper argues for prioritizing breadth over depth.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

For instance, GPT-5 reaches 70.8% on visual reasoning tasks where humans average 88.9%, yet scores 0% on adaptation tasks that test flexible rule inference.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- The framework reveals a "jagged" cognitive profile where models excel in knowledge-intensive domains but have critical deficits in foundational machinery.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Models compensate by expanding context windows, but the paper calls this a "capability contortion" that masks the absence of genuine experiential memory.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- Both GPT-4 and GPT-5 score exactly 0% on long-term memory storage. This isn't a bug but an architectural constraint of transformer models, where attention mechanisms scale quadratically with context length.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

The framework tests ten core domains: general knowledge, reading and writing, math, reasoning, working memory, long-term memory storage, memory retrieval, visual processing, auditory processing, and speed. Applying this to current models reveals GPT-4 scores 27% and GPT-5 scores 58%.

My take:

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

A who's who in AI, 33 researchers from institutions including Berkeley, MIT, Stanford, and Oxford, including Yoshua Bengio, Eric Schmidt, Gary Marcus, and Max Tegmark, developed a quantifiable framework grounded in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

The term AGI acts as a constantly moving goalpost, with criteria shifting as AI systems master tasks once thought to require human intellect. This ambiguity obscures how far we actually are from human-level cognition.

November 10, 2025 at 3:56 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

Paper: arxiv.org/pdf/2504.07854

Collections: huggingface.co/alea-instit...

Datasets: huggingface.co/alea-instit...

Tokenizers: huggingface.co/collections...

Code: github.com/alea-instit...

Website: aleainstitute.ai/

Data Gallery: gallery.kl3m.ai/document/ra...

November 8, 2025 at 3:01 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

All openly available via Hugging Face and S3 under CC-BY terms.

Interested in improving the data landscape for legal AI?
Join the HuggingLegal community on Discord: discord.gg/Mnn28ak8

Join the HuggingLegal 🤗 Discord Server!

Check out the HuggingLegal 🤗 community on Discord – hang out with 287 other members and enjoy free voice and text chat.

discord.com

November 8, 2025 at 3:01 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- Mid/post-training resources: QA pairs, summarization tasks, classification examples, drafting templates
- Multi-turn conversations from Congressional hearings and rulemaking
- kl3m-004-128k-cased tokenizer (30-40% more efficient than standard tokenizers)

November 8, 2025 at 3:01 PM

joelniklaus.bsky.social

@joelniklaus.bsky.social

- 1.35 trillion tokens across SEC EDGAR, USPTO patents, court opinions, federal regulations, EU materials
- Mean document length of 6,237 tokens; 200K+ documents exceeding 100K tokens
- Diverse domains: legal, regulatory, financial, technical (USDA protocols to NIST standards)

November 8, 2025 at 3:01 PM