#data #machinelearning #datascience #llm
After multiplication some hidden state will amplify and some will drown based on score.
After multiplication some hidden state will amplify and some will drown based on score.
1. Instead of passing only the last hidden state to decoder, now encoder passes all the hidden states.
2. Decoder does an extra step in order to focus on the part of the input that are relevent.
1. Instead of passing only the last hidden state to decoder, now encoder passes all the hidden states.
2. Decoder does an extra step in order to focus on the part of the input that are relevent.