Ian Berlot-Attwell
ianberlot.bsky.social
Ian Berlot-Attwell
@ianberlot.bsky.social
ML/NLP PhD Student. Interested in compositional generalization!
https://www.cs.toronto.edu/~ianberlot/
Combined with similar observations of the lack of reuse in other library learning systems (arxiv.org/abs/2411.01747), it’s clear we need better understanding of the limitations of current library learning systems, and improved evaluation.
See more at arxiv.org/abs/2410.20274
December 11, 2024 at 3:55 PM
Running an ablation on a subset of miniF2F, we find that a model ablated to prevent the sharing of lemmas across tasks also exhibits strong performance.
December 11, 2024 at 3:55 PM
Studying the LEGO-Prover (a system for formalizing natural language proofs by learning reusable lemmas), we find that lemma reuse is very uncommon, and no lemma reused twice.
December 11, 2024 at 3:55 PM
Studying TroVE (a system that learns reusable python functions), we find only 3 instances of a learned function being reused correctly, out of 3,201 test questions in the MATH dataset. Furthermore, our libraryless ablation outperforms the original on 3 of 4 MATH splits tested.
December 11, 2024 at 3:55 PM
Combined with similar seq2seq work (dx.doi.org/10.18653/v1/...), and concurrent VQA work looking at productivity (doi.org/10.48550/arX...) we see a close relationship between train-time diversity and compositionality in general. See more at www.cs.toronto.edu/~ianberlot/d...
November 15, 2023 at 11:06 PM
Same findings hold on a neurosymbolic NMN model, even though these models are specifically designed to be compositional!
November 15, 2023 at 11:05 PM
We stratify value pairs (e.g., blue + sphere) by attribute diversity, i.e., the number of possible train-time alternative values for each attribute. Low diversity combinations have a larger systematicity gap (difference in accuracy between seen and unseen combinations)!
November 15, 2023 at 11:05 PM
For 29 different pairs of held-out object attributes (e.g., rubber cylinders), we create separate train and test splits in a modified CLEVR setting. Combinations of certain values for this attribute pair will be present at test time, but not train.
November 15, 2023 at 11:04 PM