Badr AlKhamissi
@bkhmsi.bsky.social
PhD at EPFL 🧠💻
Ex @MetaAI, @SonyAI, @Microsoft
Egyptian 🇪🇬
Ex @MetaAI, @SonyAI, @Microsoft
Egyptian 🇪🇬
You can learn more about our work here: language-to-cognition.epfl.ch
Thanks to all my co-authors @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social and my advisors @abosselut.bsky.social and @mschrimpf.bsky.social!
Thanks to all my co-authors @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social and my advisors @abosselut.bsky.social and @mschrimpf.bsky.social!
From Language to Cognition
Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language underlying this alignment---and how brain-like repr...
language-to-cognition.epfl.ch
November 2, 2025 at 12:06 PM
You can learn more about our work here: language-to-cognition.epfl.ch
Thanks to all my co-authors @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social and my advisors @abosselut.bsky.social and @mschrimpf.bsky.social!
Thanks to all my co-authors @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social and my advisors @abosselut.bsky.social and @mschrimpf.bsky.social!
10/
🙏 Huge thanks to my incredible co-authors @cndesabbata.bsky.social, @gretatuckute.bsky.social, @eric-zemingchen.bsky.social
and my advisors @mschrimpf.bsky.social and @abosselut.bsky.social!
🙏 Huge thanks to my incredible co-authors @cndesabbata.bsky.social, @gretatuckute.bsky.social, @eric-zemingchen.bsky.social
and my advisors @mschrimpf.bsky.social and @abosselut.bsky.social!
October 20, 2025 at 12:10 PM
10/
🙏 Huge thanks to my incredible co-authors @cndesabbata.bsky.social, @gretatuckute.bsky.social, @eric-zemingchen.bsky.social
and my advisors @mschrimpf.bsky.social and @abosselut.bsky.social!
🙏 Huge thanks to my incredible co-authors @cndesabbata.bsky.social, @gretatuckute.bsky.social, @eric-zemingchen.bsky.social
and my advisors @mschrimpf.bsky.social and @abosselut.bsky.social!
9/
🔗 Explore MiCRo:
🌐 Website: cognitive-reasoners.epfl.ch
📄 Paper: arxiv.org/abs/2506.13331
🤗 HF Space (interactive): huggingface.co/spaces/bkhmsi/cognitive-reasoners
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
🔗 Explore MiCRo:
🌐 Website: cognitive-reasoners.epfl.ch
📄 Paper: arxiv.org/abs/2506.13331
🤗 HF Space (interactive): huggingface.co/spaces/bkhmsi/cognitive-reasoners
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Human cognitive behavior arises from the interaction of specialized brain networks dedicated to distinct functions, such as language, logic, and social reasoning. Inspired by this organization, we pro...
arxiv.org
October 20, 2025 at 12:10 PM
9/
🔗 Explore MiCRo:
🌐 Website: cognitive-reasoners.epfl.ch
📄 Paper: arxiv.org/abs/2506.13331
🤗 HF Space (interactive): huggingface.co/spaces/bkhmsi/cognitive-reasoners
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
🔗 Explore MiCRo:
🌐 Website: cognitive-reasoners.epfl.ch
📄 Paper: arxiv.org/abs/2506.13331
🤗 HF Space (interactive): huggingface.co/spaces/bkhmsi/cognitive-reasoners
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
8/
We now have a collection of 10 MiCRo models on HF that you can try out yourself!
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
We now have a collection of 10 MiCRo models on HF that you can try out yourself!
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
Mixture of Cognitive Reasoners - a bkhmsi Collection
https://arxiv.org/abs/2506.13331
huggingface.co
October 20, 2025 at 12:10 PM
8/
We now have a collection of 10 MiCRo models on HF that you can try out yourself!
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
We now have a collection of 10 MiCRo models on HF that you can try out yourself!
🧠 HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678
7/
We built an interactive HF Space where you can see how MiCRo routes tokens across specialized experts for any prompt, and even toggle experts on/off to see how behavior changes.
🤗 Try it here: huggingface.co/spaces/bkhms...
(Check the example prompts to get started!)
We built an interactive HF Space where you can see how MiCRo routes tokens across specialized experts for any prompt, and even toggle experts on/off to see how behavior changes.
🤗 Try it here: huggingface.co/spaces/bkhms...
(Check the example prompts to get started!)
Mixture of Cognitive Reasoners - a Hugging Face Space by bkhmsi
Enter a prompt and select a model to see how tokens are routed across Language, Logic, Social, and World experts. Optionally, disable experts to see how routing changes.
huggingface.co
October 20, 2025 at 12:10 PM
7/
We built an interactive HF Space where you can see how MiCRo routes tokens across specialized experts for any prompt, and even toggle experts on/off to see how behavior changes.
🤗 Try it here: huggingface.co/spaces/bkhms...
(Check the example prompts to get started!)
We built an interactive HF Space where you can see how MiCRo routes tokens across specialized experts for any prompt, and even toggle experts on/off to see how behavior changes.
🤗 Try it here: huggingface.co/spaces/bkhms...
(Check the example prompts to get started!)
6/
We also wondered: if neuroscientists use functional localizers to map networks in the brain, could we do the same for MiCRo’s experts?
The answer: yes! The very same localizers successfully recovered the corresponding expert modules in our models!
We also wondered: if neuroscientists use functional localizers to map networks in the brain, could we do the same for MiCRo’s experts?
The answer: yes! The very same localizers successfully recovered the corresponding expert modules in our models!
October 20, 2025 at 12:10 PM
6/
We also wondered: if neuroscientists use functional localizers to map networks in the brain, could we do the same for MiCRo’s experts?
The answer: yes! The very same localizers successfully recovered the corresponding expert modules in our models!
We also wondered: if neuroscientists use functional localizers to map networks in the brain, could we do the same for MiCRo’s experts?
The answer: yes! The very same localizers successfully recovered the corresponding expert modules in our models!
5/
One result I was particularly excited about is the emergent hierarchy we found across MiCRo layers:
🔺Earlier layers route tokens to Language experts.
🔻Deeper layers shift toward domain-relevant experts.
This emergent hierarchy mirrors patterns observed in the human brain 🧠
One result I was particularly excited about is the emergent hierarchy we found across MiCRo layers:
🔺Earlier layers route tokens to Language experts.
🔻Deeper layers shift toward domain-relevant experts.
This emergent hierarchy mirrors patterns observed in the human brain 🧠
October 20, 2025 at 12:10 PM
5/
One result I was particularly excited about is the emergent hierarchy we found across MiCRo layers:
🔺Earlier layers route tokens to Language experts.
🔻Deeper layers shift toward domain-relevant experts.
This emergent hierarchy mirrors patterns observed in the human brain 🧠
One result I was particularly excited about is the emergent hierarchy we found across MiCRo layers:
🔺Earlier layers route tokens to Language experts.
🔻Deeper layers shift toward domain-relevant experts.
This emergent hierarchy mirrors patterns observed in the human brain 🧠
4/
We find that MiCRo matches or outperforms baselines on reasoning tasks (e.g., GSM8K, BBH) and aligns better with human behavior (CogBench), while maintaining interpretability!!
We find that MiCRo matches or outperforms baselines on reasoning tasks (e.g., GSM8K, BBH) and aligns better with human behavior (CogBench), while maintaining interpretability!!
October 20, 2025 at 12:10 PM
4/
We find that MiCRo matches or outperforms baselines on reasoning tasks (e.g., GSM8K, BBH) and aligns better with human behavior (CogBench), while maintaining interpretability!!
We find that MiCRo matches or outperforms baselines on reasoning tasks (e.g., GSM8K, BBH) and aligns better with human behavior (CogBench), while maintaining interpretability!!
3/
✨ Why it matters:
MiCRo bridges AI and neuroscience:
🤖 ML side: Modular architectures make LLMs more interpretable and controllable.
🧠 Cognitive side: Provides a testbed for probing how the relative contributions of different brain networks support complex behavior.
✨ Why it matters:
MiCRo bridges AI and neuroscience:
🤖 ML side: Modular architectures make LLMs more interpretable and controllable.
🧠 Cognitive side: Provides a testbed for probing how the relative contributions of different brain networks support complex behavior.
October 20, 2025 at 12:10 PM
3/
✨ Why it matters:
MiCRo bridges AI and neuroscience:
🤖 ML side: Modular architectures make LLMs more interpretable and controllable.
🧠 Cognitive side: Provides a testbed for probing how the relative contributions of different brain networks support complex behavior.
✨ Why it matters:
MiCRo bridges AI and neuroscience:
🤖 ML side: Modular architectures make LLMs more interpretable and controllable.
🧠 Cognitive side: Provides a testbed for probing how the relative contributions of different brain networks support complex behavior.
2/
🧩 Recap:
MiCRo takes a pretrained language model and post-trains it to develop distinct, brain-inspired modules aligned with four cognitive networks:
🗣️ Language
🔢 Logic / Multiple Demand
🧍♂️ Social / Theory of Mind
🌍 World / Default Mode Network
🧩 Recap:
MiCRo takes a pretrained language model and post-trains it to develop distinct, brain-inspired modules aligned with four cognitive networks:
🗣️ Language
🔢 Logic / Multiple Demand
🧍♂️ Social / Theory of Mind
🌍 World / Default Mode Network
October 20, 2025 at 12:10 PM
2/
🧩 Recap:
MiCRo takes a pretrained language model and post-trains it to develop distinct, brain-inspired modules aligned with four cognitive networks:
🗣️ Language
🔢 Logic / Multiple Demand
🧍♂️ Social / Theory of Mind
🌍 World / Default Mode Network
🧩 Recap:
MiCRo takes a pretrained language model and post-trains it to develop distinct, brain-inspired modules aligned with four cognitive networks:
🗣️ Language
🔢 Logic / Multiple Demand
🧍♂️ Social / Theory of Mind
🌍 World / Default Mode Network
Huge thanks to my amazing collaborators: @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social & advisors @abosselut.bsky.social and @mschrimpf.bsky.social!
You can find more about our paper on the project's website: language-to-cognition.epfl.ch
Paper: arxiv.org/abs/2503.01830
You can find more about our paper on the project's website: language-to-cognition.epfl.ch
Paper: arxiv.org/abs/2503.01830
From Language to Cognition: How LLMs Outgrow the Human Language Network
Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolu...
arxiv.org
September 25, 2025 at 2:56 PM
Huge thanks to my amazing collaborators: @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social & advisors @abosselut.bsky.social and @mschrimpf.bsky.social!
You can find more about our paper on the project's website: language-to-cognition.epfl.ch
Paper: arxiv.org/abs/2503.01830
You can find more about our paper on the project's website: language-to-cognition.epfl.ch
Paper: arxiv.org/abs/2503.01830
11/ 🌐 Links
Paper: arxiv.org/abs/2506.13331
Project Page: bkhmsi.github.io/mixture-of-c...
Code: github.com/bkhmsi/mixtu...
Models: huggingface.co/collections/...
In collaboration with: @cndesabbata.bsky.social, @eric-zemingchen.bsky.social, @mschrimpf.bsky.social, & @abosselut.bsky.social
Paper: arxiv.org/abs/2506.13331
Project Page: bkhmsi.github.io/mixture-of-c...
Code: github.com/bkhmsi/mixtu...
Models: huggingface.co/collections/...
In collaboration with: @cndesabbata.bsky.social, @eric-zemingchen.bsky.social, @mschrimpf.bsky.social, & @abosselut.bsky.social
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Human intelligence emerges from the interaction of specialized brain networks, each dedicated to distinct cognitive functions such as language processing, logical reasoning, social understanding, and ...
arxiv.org
June 17, 2025 at 3:07 PM
11/ 🌐 Links
Paper: arxiv.org/abs/2506.13331
Project Page: bkhmsi.github.io/mixture-of-c...
Code: github.com/bkhmsi/mixtu...
Models: huggingface.co/collections/...
In collaboration with: @cndesabbata.bsky.social, @eric-zemingchen.bsky.social, @mschrimpf.bsky.social, & @abosselut.bsky.social
Paper: arxiv.org/abs/2506.13331
Project Page: bkhmsi.github.io/mixture-of-c...
Code: github.com/bkhmsi/mixtu...
Models: huggingface.co/collections/...
In collaboration with: @cndesabbata.bsky.social, @eric-zemingchen.bsky.social, @mschrimpf.bsky.social, & @abosselut.bsky.social
10/ 🧾 Conclusion:
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.
June 17, 2025 at 3:07 PM
10/ 🧾 Conclusion:
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.
9/ 💡 Key insights:
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models
June 17, 2025 at 3:07 PM
9/ 💡 Key insights:
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., top‑2 routing can boost performance)
3. Approach generalizes across domains & base models
8/ 🧬 Brain alignment:
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.
Figures for MiCRo-Llama & MiCRo-OLMo.
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.
Figures for MiCRo-Llama & MiCRo-OLMo.
June 17, 2025 at 3:07 PM
8/ 🧬 Brain alignment:
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.
Figures for MiCRo-Llama & MiCRo-OLMo.
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.
Figures for MiCRo-Llama & MiCRo-OLMo.
7/ 🧩 Steering & controllability:
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.
June 17, 2025 at 3:07 PM
7/ 🧩 Steering & controllability:
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightly—showcasing fine-grained control.
6/ 🔄 Interpretable routing:
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.
June 17, 2025 at 3:07 PM
6/ 🔄 Interpretable routing:
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.
5/ 📈 Performance gains:
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.
June 17, 2025 at 3:07 PM
5/ 📈 Performance gains:
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBH…), MiCRo outperforms both dense and “general‑expert” baselines: modular models with random specialist assignment in Stage 1.