Lightnews — Scholar-powered news

Raphael Pisoni

@4rtemi5.bsky.social

You've been researching for a while!
Time to have some SOTA!

#aislop

July 26, 2025 at 12:51 PM

Raphael Pisoni

@4rtemi5.bsky.social

Is there a reason why none of the recent models use RBF-kernel Attention to get rid of the softmax-bottleneck for long context?
I tried replacing dot-product attention with the negative squared KQ-distance and was able to remove the softmax without issues and loss in performance!

July 23, 2025 at 8:14 PM

Raphael Pisoni

@4rtemi5.bsky.social

Grok this! What a roller-coaster of emotions...🤪

April 16, 2025 at 7:01 PM

Raphael Pisoni

@4rtemi5.bsky.social

x''= 0

March 24, 2025 at 6:55 AM

Raphael Pisoni

@4rtemi5.bsky.social

Intuitively, since all cross-similarities are used for training, a single bad sample can have a large impact on all other samples in the batch. I've been using a loss with a similar intuition for some years and while this one is more elegant it probably reacts in a similar way.

January 21, 2025 at 7:28 PM

Raphael Pisoni

@4rtemi5.bsky.social

She said YES!🥰

December 29, 2024 at 4:48 PM

Raphael Pisoni

@4rtemi5.bsky.social

Semantics are getting better and better!🤩

November 23, 2024 at 3:25 PM

Raphael Pisoni

@4rtemi5.bsky.social

And this is the current sorry state of the "DINO-dog".
As you can see the model really cares about the semantics of background too. Hope this will get better in the second half of the training.🤞

November 22, 2024 at 9:24 PM

Raphael Pisoni

@4rtemi5.bsky.social

Failures? None! (😉)
I thought the flowers were important too but the model doesn't think so.

November 22, 2024 at 9:24 PM

Raphael Pisoni

@4rtemi5.bsky.social

Another tough one. Below you can see the similarity map with one pixel on the skateboard. If the model actually "knows" what a skateboard is or if it goes purely by color/texture is really hard to judge but I want to believe.😅

November 22, 2024 at 9:24 PM

Raphael Pisoni

@4rtemi5.bsky.social

This sample might seem simple but the thin and long features on a "large" 512x512 image are not at all easy for the model to get right.

November 22, 2024 at 9:24 PM

Raphael Pisoni

@4rtemi5.bsky.social

Over the course of the training the model seems to shift more attention from pure color and spatial similarity to more semantic similarity. One downside of this is interestingly that it cares less about sharp object borders but let's see how that develops.

November 22, 2024 at 9:24 PM

Raphael Pisoni

@4rtemi5.bsky.social

For those waiting for news on the training of the single-GPU "DINO-like" model: About half the training is now over so let me pick some cherries for you: 🍒
🧵

November 22, 2024 at 9:24 PM

Raphael Pisoni

@4rtemi5.bsky.social

After some discussions some flaws of the cited approach came up.
But I think it has value to be able to balance losses with changing magnitudes explicitly so I came up with a variant that leaves the grads unaffected while fixing the loss magnitude to 1.0
WDYT now?
bsky.app/profile/4rte...

November 22, 2024 at 9:49 AM

Raphael Pisoni

@4rtemi5.bsky.social

Good point! But since loss functions have different shapes the one with the currently larger value should always be preferred? L1 and L2 are great examples.

November 22, 2024 at 3:41 AM

Raphael Pisoni

@4rtemi5.bsky.social

One iteration over the training data is complete and we see some really nice properties emerge. One of them is that the similarities seem to get more semantic now and rely less on color and locality.
BTW thanks to @merve.bsky.social and Niels for the pic!🤗

November 20, 2024 at 7:19 PM

Raphael Pisoni

@4rtemi5.bsky.social

This is what our dogie looks like after one iteration over the training data! 🤩

Sample image of dark pug running on grass.

PCA transform of the pixelwise representation of the sample image of the pug.

Similarity transform for each pixel with the central pixel on the pug in the sample image.

November 20, 2024 at 2:54 PM

Raphael Pisoni

@4rtemi5.bsky.social

Link to the gist: gist.github.com/4rtemi5/efd2...

Screenshot of a sample implementation of the masked convolution layer with dropout.

November 20, 2024 at 8:03 AM

Raphael Pisoni

@4rtemi5.bsky.social

Training is going great BTW! Also great to see the changing similarities when you select the heads of the different horses! 🤩 Love to see it grow! I hope you do too! 😉

November 19, 2024 at 3:45 PM

Raphael Pisoni

@4rtemi5.bsky.social

Let me tell you about some other random shit while we wait for the model to train: Did you know that almost all convolutional architectures have a flaw that makes them suboptimal for segmentation or other 2d tasks?
It's the padding! Let me show you how to fix it!🧵 #mlsky

Figure displaying how images are padded in convolutional neural networks. Image credit: https://d2l.ai/chapter_convolutional-neural-networks/padding-and-strides.html

November 19, 2024 at 12:26 PM

Raphael Pisoni

@4rtemi5.bsky.social

Found a bug and had to restart the training but the model picked up strong. One more sample with high-res PCA, and similarity with a pixel on the left ear. Nice to see the model picking up some structural and semantic(?) similarity.