Badr AlKhamissi
banner
bkhmsi.bsky.social
Badr AlKhamissi
@bkhmsi.bsky.social
PhD at EPFL 🧠💻

Ex @MetaAI, @SonyAI, @Microsoft

Egyptian 🇪🇬
October 20, 2025 at 12:10 PM
8/
We now have a collection of 10 MiCRo models on HF that you can try out yourself!

🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
Mixture of Cognitive Reasoners - a bkhmsi Collection
https://arxiv.org/abs/2506.13331
huggingface.co
October 20, 2025 at 12:10 PM
7/
We built an interactive HF Space where you can see how MiCRo routes tokens across specialized experts for any prompt, and even toggle experts on/off to see how behavior changes.

🤗 Try it here: huggingface.co/spaces/bkhms...
(Check the example prompts to get started!)
Mixture of Cognitive Reasoners - a Hugging Face Space by bkhmsi
Enter a prompt and select a model to see how tokens are routed across Language, Logic, Social, and World experts. Optionally, disable experts to see how routing changes.
huggingface.co
October 20, 2025 at 12:10 PM
6/
We also wondered: if neuroscientists use functional localizers to map networks in the brain, could we do the same for MiCRo’s experts?

The answer: yes! The very same localizers successfully recovered the corresponding expert modules in our models!
October 20, 2025 at 12:10 PM
5/
One result I was particularly excited about is the emergent hierarchy we found across MiCRo layers:

🔺Earlier layers route tokens to Language experts.
🔻Deeper layers shift toward domain-relevant experts.

This emergent hierarchy mirrors patterns observed in the human brain 🧠
October 20, 2025 at 12:10 PM
4/
We find that MiCRo matches or outperforms baselines on reasoning tasks (e.g., GSM8K, BBH) and aligns better with human behavior (CogBench), while maintaining interpretability!!
October 20, 2025 at 12:10 PM
3/
✨ Why it matters:

MiCRo bridges AI and neuroscience:

🤖 ML side: Modular architectures make LLMs more interpretable and controllable.
🧠 Cognitive side: Provides a testbed for probing how the relative contributions of different brain networks support complex behavior.
October 20, 2025 at 12:10 PM
2/
🧩 Recap:
MiCRo takes a pretrained language model and post-trains it to develop distinct, brain-inspired modules aligned with four cognitive networks:

🗣️ Language
🔢 Logic / Multiple Demand
🧍‍♂️ Social / Theory of Mind
🌍 World / Default Mode Network
October 20, 2025 at 12:10 PM
10/ 🧾 Conclusion:
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.
June 17, 2025 at 3:07 PM
9/ 💡 Key insights:
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models
June 17, 2025 at 3:07 PM
8/ 🧬 Brain alignment:
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.

Figures for MiCRo-Llama & MiCRo-OLMo.
June 17, 2025 at 3:07 PM
7/ 🧩 Steering & controllability:
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.
June 17, 2025 at 3:07 PM
6/ 🔄 Interpretable routing:
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.
June 17, 2025 at 3:07 PM
5/ 📈 Performance gains:
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.
June 17, 2025 at 3:07 PM