Thom Lake
thomlake.bsky.social
Thom Lake
@thomlake.bsky.social
Principal Scientist at Indeed. PhD Student at UT Austin. AI, Deep Learning, PGMs, and NLP.
Due to the split between the inputs statements and query, the resulting model isn't a generic sequence processor like RNNs or transformers. However, if you were to process a sequence by treating each element as a new query, you'd get something that looks a lot like a transformer.
December 2, 2024 at 4:37 PM
MemNets first encode each input sentence/statement with a position embedding independently. These are the "memories". Finally, you encode the query and apply cross-attention between that and the memories. Rinse and repeat for some fixed depth. No for-loop over time here.
December 2, 2024 at 4:35 PM
The recurrence there is referencing depth-wise weight tying (see Section 2.2).

> Layer-wise (RNN-like): the input and output embeddings are the same across different layers
December 2, 2024 at 3:49 PM
Memory networks were earlier, attention only, and had position embeddings, but were not word/token level: arxiv.org/abs/1503.08895

They were later elaborated with the key-value distinction which is, AFAIK, where this terminology arises: arxiv.org/abs/1606.03126
End-To-End Memory Networks
We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that wo...
arxiv.org
December 2, 2024 at 6:32 AM
👋
November 25, 2024 at 2:44 PM