Baran Hashemi
banner
rythian47.bsky.social
Baran Hashemi
@rythian47.bsky.social
AI for Mathematics
Another new result from the #NeurIPS rebuttal/discussion phase, our Tropical Transformer achieves much better length OOD performance across all algorithmic tasks, while being 3x-9x faster at inference and using 20% fewer parameters than the Universal Transformer (UT) models.
August 4, 2025 at 8:47 PM
6/ We also show that each Tropical attention head can function as a tropical gate in a tropical circuit, simulating any max-plus circuit.
May 26, 2025 at 1:08 PM
5/ We benchmarked on 11 canonical combinatorial tasks. Tropical attention beat vanilla & adaptive softmax attention on all three OOD axes, Length, value and Adversarial attack generalization:
May 26, 2025 at 1:08 PM
4/ Tropical Attention runs each head natively in max-plus. Result:
Strong OOD length generalization with sharp attention maps even in several algorithmic tasks, including the notorious Quickselect algorithm (Another settlement for the challenge identified by @mgalkin.bsky.social )
May 26, 2025 at 1:08 PM
3/ In the Tropical (max + ) geometry, “addition” is max, “multiplication” is +. Many algorithms already live here, carving exact polyhedral decision boundaries --> so why force them through exponential probabilities?
Let's ditch softmax, embrace the tropical semiring 🤯🍹.
May 26, 2025 at 1:08 PM
🧵 Tropical Attention --> Softmax is out, Tropical max-plus is in 🦾
1/ 🔥Ever experinced softmax attention fade as sequences grow?
That blur is why many attention mechanisms stumble on algorithmic and reasoning tasks. Well, we have a Algebraic Geometric Tropical solution 🌴
May 26, 2025 at 1:08 PM
I'm speaking about AI for enumerative geometry at the CMSA New Technologies in Mathematics seminar, on Wednesday.
April 7, 2025 at 6:40 PM