Lightnews — Scholar-powered news

Nicolas Keriven

@nicolaskeriven.bsky.social

250 followers 57 following 75 posts

CNRS researcher. Hates genAI with passion. Food and music 🎹 https://linktr.ee/bluecurlmusic

Posts Replies Media Videos

Nicolas Keriven

@nicolaskeriven.bsky.social

Here you go good sir :)

distrokid.com/hyperfollow/...

Horizon by Blue Curl

Stream and Save Horizon - Distributed by DistroKid

distrokid.com

August 6, 2025 at 11:39 AM

Nicolas Keriven

@nicolaskeriven.bsky.social

Colon, question mark, ticking all the boxes

May 23, 2025 at 8:32 AM

Nicolas Keriven

@nicolaskeriven.bsky.social

Also note that we limit ourselves to vanilla GNNs: no skip connections, normalization, etc classical anti-oversmoothing strategies. That's for the future!

7/7

May 23, 2025 at 7:59 AM

Nicolas Keriven

@nicolaskeriven.bsky.social

Note that, while many optimization results are adapted from MLP to GNNs, here this is *specific* to GNNs: we show that this does not happen for MLPs.

6/7

May 23, 2025 at 7:59 AM

Nicolas Keriven

@nicolaskeriven.bsky.social

...and that's bad! As soon as the last layer is trained (that happens immediately!), this average vanishes. The entire GNN is quickly stuck in a terrible near-critical point!
(*at every layers*, due to further mechanisms).

That's the main result of this paper.

5/7

May 23, 2025 at 7:59 AM

Nicolas Keriven

@nicolaskeriven.bsky.social

...the "oversmoothing" of the backward signal is very particular: *when the forward is oversmoothed*, the updates to the backward are almost "linear-GNN" with no non-linearity.

So we *can* compute its oversmoothed limit! It is the **average** of the prediction error...

4/7

May 23, 2025 at 7:59 AM

Nicolas Keriven

@nicolaskeriven.bsky.social

... but at the *middle* layers! Due to the non-linearity, it oversmoothes *when the forward is also oversmoothed* 😲

Why is it bad for training? Well...

3/7

May 23, 2025 at 7:59 AM

Nicolas Keriven

@nicolaskeriven.bsky.social

The gradients are formed of two parts: the forward signal, and the backward "signal", initialized at the prediction error at the output then backpropagated.

Like the forward, the backward is multiplied by the graph matrix, so it will oversmooth...

2/7

May 23, 2025 at 7:59 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news