Lightnews — Scholar-powered news

Caio

@caiocorro.bsky.social

You need this fixed logsumexp function, otherwise you will have NaN gradients for the neural network. I personnaly came across this bug when building a CRF with the following transition structure for discontinuous named entity recognition, see here: aclanthology.org/2024.emnlp-m...

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

But now the backward pass works as expected, and you get null gradients for w1.

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

Yes: implement your own logsumexp function that fixes this bug. I found the workaround by coming across this github issue: github.com/pytorch/pyto...
The forward pass is basically the same, but using the custom logsumexp function.

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

But this will give you NaN gradient for w1 ! If you look at the gradient of w2, masked values have a null gradient, as expected. But for w1, instead of having a vector of null gradients, we have a vector of NaNs. This completly breaks gradient backprop and grad descent. So, is there a nice solution?

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

The input of the first logsumexp is completly masked, the second one is partially masked, the last one has no masked applied on it. Obviously, the first logit is equal to -inf. This means a "masked output probability" after softmax. We can then just compute a loss and backpropagate gradient.

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

One of the hardest Pytorch bug I had to debug is due to how the logsumexp behave with -inf masked inputs. Consider the following example. I build a vector of 3 logits, and each logit is the result of a logsumexp.

November 4, 2025 at 9:12 AM

Caio

@caiocorro.bsky.social

En plus y avait une pure soirée à Paris ce soir, je suis deg de pas être là.

June 21, 2025 at 9:00 PM

Caio

@caiocorro.bsky.social

Forget about Viterbi and forward algorithms! In this paper (accepted @ ACL 2025), we introduce Bregman CRFs for sequence labeling. We propose a novel (approximate) inference algorithm based on iterative Bregman projections which can take full advantage of modern GPUs.

caio-corro.fr/pdf/bregman_...

Bregman Conditional Random Fields: Sequence Labeling with Parallelizable Inference Algorithms (Caio Corro, Mathieu Lacroix, Joseph, Le Roux)

June 2, 2025 at 10:34 AM

Caio

@caiocorro.bsky.social

Computer Science is not welcomed at ACL. Best they can do is "engineering experiments". If you do ML research, probably not welcomed either.

February 14, 2025 at 5:26 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news