Raphael Pisoni
4rtemi5.bsky.social
Raphael Pisoni
@4rtemi5.bsky.social
Unsupervised multimodal representation of a learning researcher.
https://www.rpisoni.dev/
You've been researching for a while!
Time to have some SOTA!

#aislop
July 26, 2025 at 12:51 PM
Is there a reason why none of the recent models use RBF-kernel Attention to get rid of the softmax-bottleneck for long context?
I tried replacing dot-product attention with the negative squared KQ-distance and was able to remove the softmax without issues and loss in performance!
July 23, 2025 at 8:14 PM
Grok this! What a roller-coaster of emotions...🤪
April 16, 2025 at 7:01 PM
x''= 0
March 24, 2025 at 6:55 AM
Intuitively, since all cross-similarities are used for training, a single bad sample can have a large impact on all other samples in the batch. I've been using a loss with a similar intuition for some years and while this one is more elegant it probably reacts in a similar way.
January 21, 2025 at 7:28 PM
She said YES!🥰
December 29, 2024 at 4:48 PM
Semantics are getting better and better!🤩
November 23, 2024 at 3:25 PM
And this is the current sorry state of the "DINO-dog".
As you can see the model really cares about the semantics of background too. Hope this will get better in the second half of the training.🤞
November 22, 2024 at 9:24 PM
Failures? None! (😉)
I thought the flowers were important too but the model doesn't think so.
November 22, 2024 at 9:24 PM
Another tough one. Below you can see the similarity map with one pixel on the skateboard. If the model actually "knows" what a skateboard is or if it goes purely by color/texture is really hard to judge but I want to believe.😅
November 22, 2024 at 9:24 PM
This sample might seem simple but the thin and long features on a "large" 512x512 image are not at all easy for the model to get right.
November 22, 2024 at 9:24 PM
Over the course of the training the model seems to shift more attention from pure color and spatial similarity to more semantic similarity. One downside of this is interestingly that it cares less about sharp object borders but let's see how that develops.
November 22, 2024 at 9:24 PM
For those waiting for news on the training of the single-GPU "DINO-like" model: About half the training is now over so let me pick some cherries for you: 🍒
🧵
November 22, 2024 at 9:24 PM
After some discussions some flaws of the cited approach came up.
But I think it has value to be able to balance losses with changing magnitudes explicitly so I came up with a variant that leaves the grads unaffected while fixing the loss magnitude to 1.0
WDYT now?
bsky.app/profile/4rte...
November 22, 2024 at 9:49 AM
Good point! But since loss functions have different shapes the one with the currently larger value should always be preferred? L1 and L2 are great examples.
November 22, 2024 at 3:41 AM
One iteration over the training data is complete and we see some really nice properties emerge. One of them is that the similarities seem to get more semantic now and rely less on color and locality.
BTW thanks to @merve.bsky.social and Niels for the pic!🤗
November 20, 2024 at 7:19 PM
This is what our dogie looks like after one iteration over the training data! 🤩
November 20, 2024 at 2:54 PM
November 20, 2024 at 8:03 AM
Training is going great BTW! Also great to see the changing similarities when you select the heads of the different horses! 🤩 Love to see it grow! I hope you do too! 😉
November 19, 2024 at 3:45 PM
Let me tell you about some other random shit while we wait for the model to train: Did you know that almost all convolutional architectures have a flaw that makes them suboptimal for segmentation or other 2d tasks?
It's the padding! Let me show you how to fix it!🧵 #mlsky
November 19, 2024 at 12:26 PM
Found a bug and had to restart the training but the model picked up strong. One more sample with high-res PCA, and similarity with a pixel on the left ear. Nice to see the model picking up some structural and semantic(?) similarity.
November 19, 2024 at 11:20 AM
Sneak peak after a couple hours of training! More details coming soon!
November 18, 2024 at 6:33 PM
Too close to home these days?
November 3, 2024 at 9:01 AM
👋
November 3, 2024 at 12:33 AM