Strong OOD length generalization with sharp attention maps even in several algorithmic tasks, including the notorious Quickselect algorithm (Another settlement for the challenge identified by @mgalkin.bsky.social )
Strong OOD length generalization with sharp attention maps even in several algorithmic tasks, including the notorious Quickselect algorithm (Another settlement for the challenge identified by @mgalkin.bsky.social )
Let's ditch softmax, embrace the tropical semiring 🤯🍹.
Let's ditch softmax, embrace the tropical semiring 🤯🍹.
1/ 🔥Ever experinced softmax attention fade as sequences grow?
That blur is why many attention mechanisms stumble on algorithmic and reasoning tasks. Well, we have a Algebraic Geometric Tropical solution 🌴
1/ 🔥Ever experinced softmax attention fade as sequences grow?
That blur is why many attention mechanisms stumble on algorithmic and reasoning tasks. Well, we have a Algebraic Geometric Tropical solution 🌴