julien-siems.bsky.social
@julien-siems.bsky.social
⚡DeltaProduct update with new results:
- Characterization of DeltaProduct’s state-tracking ability
- Inspection of the hidden state’s effective rank sheds light on why DeltaProduct extrapolates better to longer sequences than DeltaNet.
- Improved scaling analysis
And more!
June 14, 2025 at 8:02 AM
Reposted
DeltaProduct is here! Achieve better state tracing through highly parallel execution. Explore more!🚀
1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!
April 9, 2025 at 10:11 AM
1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!
March 28, 2025 at 2:39 PM