Lightnews — Scholar-powered news

Andrea Perin

@zazzarazzaz.bsky.social

Finally, our theory recapitulates the behavior of finite width networks trained on a large subsection of MNIST, where specific rotations are held out during training for specific digit classes.

January 14, 2025 at 1:05 PM

Andrea Perin

@zazzarazzaz.bsky.social

Regarding equivariant architectures, we recover the classical result that if the network architecture is invariant to the symmetry of interest, it will correctly generalize on the missing poses (because the orbits collapse in kernel space).

January 14, 2025 at 1:05 PM

Andrea Perin

@zazzarazzaz.bsky.social

Our theory also applies to neural networks with various architectures (MLPs and CNNs), applied to pairs of rotation orbits of MNIST, in the kernel regime. Here too, correct generalization can be seen as a simple SNR: how distinct the classes are vs. how non-local the symmetric structure is.

January 14, 2025 at 1:05 PM

Andrea Perin

@zazzarazzaz.bsky.social

We derive geometric insights from the spectral formula, allowing us to understand when and why kernel methods fail to generalize on symmetric datasets. We find that, in the simplest cases, the numerator is prop. to class separation, while the denominator is prop. to the point density in an orbit.

January 14, 2025 at 1:05 PM

Andrea Perin

@zazzarazzaz.bsky.social

We derive a simple formula for the prediction error on a missing point, as a function of the (reciprocal) Fourier spectrum of the kernel matrix. We call it spectral error, and it is the ratio between the last reciprocal frequency and the average.

January 14, 2025 at 1:05 PM

Andrea Perin

@zazzarazzaz.bsky.social

More precisely, our datasets are generated by the action of a cyclic group on two seed samples, forming two distinct classes. The kernel matrix on these datasets is circulant, and thus we can study the problem in Fourier space (here illustrated for the RBF kernel).

January 14, 2025 at 1:05 PM

Andrea Perin

@zazzarazzaz.bsky.social

We empirically find that such generalization does not take place in general: for instance, networks cannot extrapolate unseen image rotations of MNIST digits, unless the number of sampled rotations is large. 𝘉𝘶𝘵 𝘸𝘩𝘺 𝘥𝘰 𝘯𝘦𝘵𝘸𝘰𝘳𝘬𝘴 𝘧𝘢𝘪𝘭?

January 14, 2025 at 1:05 PM

Andrea Perin

@zazzarazzaz.bsky.social

Deep networks tend to fare poorly on rare symmetric transformations, for instance objects seen in unusual poses. This has been observed empirically many times (see, for instance, Abbas and Deny, 2023 arxiv.org/abs/2207.08034).

January 14, 2025 at 1:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news