Lightnews — Scholar-powered news

7/
We built an interactive HF Space where you can see how MiCRo routes tokens across specialized experts for any prompt, and even toggle experts on/off to see how behavior changes.

🤗 Try it here: huggingface.co/spaces/bkhms...
(Check the example prompts to get started!)

Mixture of Cognitive Reasoners - a Hugging Face Space by bkhmsi

Enter a prompt and select a model to see how tokens are routed across Language, Logic, Social, and World experts. Optionally, disable experts to see how routing changes.

huggingface.co

October 20, 2025 at 12:10 PM

Badr AlKhamissi

@bkhmsi.bsky.social

6/
We also wondered: if neuroscientists use functional localizers to map networks in the brain, could we do the same for MiCRo’s experts?

The answer: yes! The very same localizers successfully recovered the corresponding expert modules in our models!

October 20, 2025 at 12:10 PM

Badr AlKhamissi

@bkhmsi.bsky.social

5/
One result I was particularly excited about is the emergent hierarchy we found across MiCRo layers:

🔺Earlier layers route tokens to Language experts.
🔻Deeper layers shift toward domain-relevant experts.

This emergent hierarchy mirrors patterns observed in the human brain 🧠

October 20, 2025 at 12:10 PM

Badr AlKhamissi

@bkhmsi.bsky.social

4/
We find that MiCRo matches or outperforms baselines on reasoning tasks (e.g., GSM8K, BBH) and aligns better with human behavior (CogBench), while maintaining interpretability!!

October 20, 2025 at 12:10 PM

Badr AlKhamissi

@bkhmsi.bsky.social

3/
✨ Why it matters:

MiCRo bridges AI and neuroscience:

🤖 ML side: Modular architectures make LLMs more interpretable and controllable.
🧠 Cognitive side: Provides a testbed for probing how the relative contributions of different brain networks support complex behavior.

October 20, 2025 at 12:10 PM

Badr AlKhamissi

@bkhmsi.bsky.social

2/
🧩 Recap:
MiCRo takes a pretrained language model and post-trains it to develop distinct, brain-inspired modules aligned with four cognitive networks:

🗣️ Language
🔢 Logic / Multiple Demand
🧍‍♂️ Social / Theory of Mind
🌍 World / Default Mode Network

October 20, 2025 at 12:10 PM

Badr AlKhamissi

@bkhmsi.bsky.social

Huge thanks to my amazing collaborators: @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social & advisors @abosselut.bsky.social and @mschrimpf.bsky.social!

You can find more about our paper on the project's website: language-to-cognition.epfl.ch

Paper: arxiv.org/abs/2503.01830

From Language to Cognition: How LLMs Outgrow the Human Language Network

Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolu...

arxiv.org

September 25, 2025 at 2:56 PM

Badr AlKhamissi

@bkhmsi.bsky.social

11/ 🌐 Links

Paper: arxiv.org/abs/2506.13331
Project Page: bkhmsi.github.io/mixture-of-c...
Code: github.com/bkhmsi/mixtu...
Models: huggingface.co/collections/...

In collaboration with: @cndesabbata.bsky.social, @eric-zemingchen.bsky.social, @mschrimpf.bsky.social, & @abosselut.bsky.social

Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization

Human intelligence emerges from the interaction of specialized brain networks, each dedicated to distinct cognitive functions such as language processing, logical reasoning, social understanding, and ...

arxiv.org

June 17, 2025 at 3:07 PM

Badr AlKhamissi

@bkhmsi.bsky.social

10/ 🧾 Conclusion:
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.

June 17, 2025 at 3:07 PM

Badr AlKhamissi

@bkhmsi.bsky.social

9/ 💡 Key insights:
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models

June 17, 2025 at 3:07 PM

Badr AlKhamissi

@bkhmsi.bsky.social

8/ 🧬 Brain alignment:
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.

Figures for MiCRo-Llama & MiCRo-OLMo.

June 17, 2025 at 3:07 PM

Badr AlKhamissi

@bkhmsi.bsky.social

7/ 🧩 Steering & controllability:
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.

June 17, 2025 at 3:07 PM

Badr AlKhamissi

@bkhmsi.bsky.social

6/ 🔄 Interpretable routing:
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.

June 17, 2025 at 3:07 PM

Badr AlKhamissi

@bkhmsi.bsky.social

5/ 📈 Performance gains:
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.

June 17, 2025 at 3:07 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news