- Characterization of DeltaProduct’s state-tracking ability
- Inspection of the hidden state’s effective rank sheds light on why DeltaProduct extrapolates better to longer sequences than DeltaNet.
- Improved scaling analysis
And more!
- Characterization of DeltaProduct’s state-tracking ability
- Inspection of the hidden state’s effective rank sheds light on why DeltaProduct extrapolates better to longer sequences than DeltaNet.
- Improved scaling analysis
And more!