Lightnews — Scholar-powered news

Thom Lake

@thomlake.bsky.social

700 followers 410 following 6 posts

Principal Scientist at Indeed. PhD Student at UT Austin. AI, Deep Learning, PGMs, and NLP.

Posts Replies Media Videos

Thom Lake

@thomlake.bsky.social

Due to the split between the inputs statements and query, the resulting model isn't a generic sequence processor like RNNs or transformers. However, if you were to process a sequence by treating each element as a new query, you'd get something that looks a lot like a transformer.

December 2, 2024 at 4:37 PM

Thom Lake

@thomlake.bsky.social

MemNets first encode each input sentence/statement with a position embedding independently. These are the "memories". Finally, you encode the query and apply cross-attention between that and the memories. Rinse and repeat for some fixed depth. No for-loop over time here.

December 2, 2024 at 4:35 PM

Thom Lake

@thomlake.bsky.social

The recurrence there is referencing depth-wise weight tying (see Section 2.2).

> Layer-wise (RNN-like): the input and output embeddings are the same across different layers

December 2, 2024 at 3:49 PM

Thom Lake

@thomlake.bsky.social

Memory networks were earlier, attention only, and had position embeddings, but were not word/token level: arxiv.org/abs/1503.08895

They were later elaborated with the key-value distinction which is, AFAIK, where this terminology arises: arxiv.org/abs/1606.03126

End-To-End Memory Networks

We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that wo...

arxiv.org

December 2, 2024 at 6:32 AM

Thom Lake

@thomlake.bsky.social

👋

November 25, 2024 at 2:44 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news