> Layer-wise (RNN-like): the input and output embeddings are the same across different layers
> Layer-wise (RNN-like): the input and output embeddings are the same across different layers
They were later elaborated with the key-value distinction which is, AFAIK, where this terminology arises: arxiv.org/abs/1606.03126
They were later elaborated with the key-value distinction which is, AFAIK, where this terminology arises: arxiv.org/abs/1606.03126