Simon Schug
smonsays.bsky.social
Simon Schug
@smonsays.bsky.social
postdoc @princeton
computational cognitive science ∪ machine learning
https://smn.one
November 4, 2025 at 2:33 PM
But, not all training distributions enable compositional generalization -- even with scale.
Strategically choosing the training data matters a lot.
November 4, 2025 at 2:33 PM
We prove that MLPs can implement a general class of compositional tasks ("hyperteachers") using only a linear number of neurons in the number of modules, beating the exponential!
November 4, 2025 at 2:33 PM
It turns out that simply scaling multilayer perceptrons / transformers can lead to compositional generalization.
November 4, 2025 at 2:33 PM
Most natural data has compositional structure. This leads to a combinatorial explosion that is impossible to fully cover in the training data.

It might be tempting to think that we need to equip neural network architectures with stronger symbolic priors to capture this compositionality, but do we?
November 4, 2025 at 2:33 PM
Would love to be added as well :)
November 20, 2024 at 8:50 PM