joelniklaus.bsky.social
@joelniklaus.bsky.social
- Cool to see this being done on the French supercomputer Jean Zay
November 11, 2025 at 3:59 PM
- They don't release any code and the method description is quite high-level only: For example I am curious how they finetuned their models and would love to learn more about how they set up their synthetic data pipeline. Looking forward to the full report.
November 11, 2025 at 3:59 PM
- They only evaluate on MMLU, GSM8K and HotPotQA. This seems cherry-picked, I wonder how their dataset performs on other standard benchmarks. They say that they basically skip pre-training and go straight to post-training.
November 11, 2025 at 3:59 PM
- Seems like a cool case study pushing really small models to the limits (30 MMLU for a 56M model)
November 11, 2025 at 3:59 PM
- Co-author Gary Marcus notes he doesn't agree with every detail but signed on to support better articulation of what AGI means. The equal 10% weighting across domains is one choice among many reasonable configurations, though the paper argues for prioritizing breadth over depth.
November 10, 2025 at 3:56 PM
For instance, GPT-5 reaches 70.8% on visual reasoning tasks where humans average 88.9%, yet scores 0% on adaptation tasks that test flexible rule inference.
November 10, 2025 at 3:56 PM
- The framework reveals a "jagged" cognitive profile where models excel in knowledge-intensive domains but have critical deficits in foundational machinery.
November 10, 2025 at 3:56 PM
Models compensate by expanding context windows, but the paper calls this a "capability contortion" that masks the absence of genuine experiential memory.
November 10, 2025 at 3:56 PM
- Both GPT-4 and GPT-5 score exactly 0% on long-term memory storage. This isn't a bug but an architectural constraint of transformer models, where attention mechanisms scale quadratically with context length.
November 10, 2025 at 3:56 PM
The framework tests ten core domains: general knowledge, reading and writing, math, reasoning, working memory, long-term memory storage, memory retrieval, visual processing, auditory processing, and speed. Applying this to current models reveals GPT-4 scores 27% and GPT-5 scores 58%.

My take:
November 10, 2025 at 3:56 PM
A who's who in AI, 33 researchers from institutions including Berkeley, MIT, Stanford, and Oxford, including Yoshua Bengio, Eric Schmidt, Gary Marcus, and Max Tegmark, developed a quantifiable framework grounded in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.
November 10, 2025 at 3:56 PM
The term AGI acts as a constantly moving goalpost, with criteria shifting as AI systems master tasks once thought to require human intellect. This ambiguity obscures how far we actually are from human-level cognition.
November 10, 2025 at 3:56 PM
All openly available via Hugging Face and S3 under CC-BY terms.

Interested in improving the data landscape for legal AI?
Join the HuggingLegal community on Discord: discord.gg/Mnn28ak8
Join the HuggingLegal 🤗 Discord Server!
Check out the HuggingLegal 🤗 community on Discord – hang out with 287 other members and enjoy free voice and text chat.
discord.com
November 8, 2025 at 3:01 PM
- Mid/post-training resources: QA pairs, summarization tasks, classification examples, drafting templates
- Multi-turn conversations from Congressional hearings and rulemaking
- kl3m-004-128k-cased tokenizer (30-40% more efficient than standard tokenizers)
November 8, 2025 at 3:01 PM
- 1.35 trillion tokens across SEC EDGAR, USPTO patents, court opinions, federal regulations, EU materials
- Mean document length of 6,237 tokens; 200K+ documents exceeding 100K tokens
- Diverse domains: legal, regulatory, financial, technical (USDA protocols to NIST standards)
November 8, 2025 at 3:01 PM
isaacus (Isaacus)
huggingface.co
November 7, 2025 at 4:02 PM
If you care about in this kind of work, join our HuggingLegal community discord: discord.gg/jwNRGmWY
Join the HuggingLegal 🤗 Discord Server!
Check out the HuggingLegal 🤗 community on Discord – hang out with 280 other members and enjoy free voice and text chat.
discord.com
November 7, 2025 at 4:02 PM
freelawproject/modernbert-embed-base_finetune_512 · Hugging Face
huggingface.co
November 6, 2025 at 4:04 PM