Caio
caiocorro.bsky.social
Caio
@caiocorro.bsky.social
NLP researcher
You need this fixed logsumexp function, otherwise you will have NaN gradients for the neural network. I personnaly came across this bug when building a CRF with the following transition structure for discontinuous named entity recognition, see here: aclanthology.org/2024.emnlp-m...
November 4, 2025 at 9:12 AM
But now the backward pass works as expected, and you get null gradients for w1.
November 4, 2025 at 9:12 AM
Yes: implement your own logsumexp function that fixes this bug. I found the workaround by coming across this github issue: github.com/pytorch/pyto...
The forward pass is basically the same, but using the custom logsumexp function.
November 4, 2025 at 9:12 AM
But this will give you NaN gradient for w1 ! If you look at the gradient of w2, masked values have a null gradient, as expected. But for w1, instead of having a vector of null gradients, we have a vector of NaNs. This completly breaks gradient backprop and grad descent. So, is there a nice solution?
November 4, 2025 at 9:12 AM
The input of the first logsumexp is completly masked, the second one is partially masked, the last one has no masked applied on it. Obviously, the first logit is equal to -inf. This means a "masked output probability" after softmax. We can then just compute a loss and backpropagate gradient.
November 4, 2025 at 9:12 AM
One of the hardest Pytorch bug I had to debug is due to how the logsumexp behave with -inf masked inputs. Consider the following example. I build a vector of 3 logits, and each logit is the result of a logsumexp.
November 4, 2025 at 9:12 AM
En plus y avait une pure soirée à Paris ce soir, je suis deg de pas être là.
June 21, 2025 at 9:00 PM
Forget about Viterbi and forward algorithms! In this paper (accepted @ ACL 2025), we introduce Bregman CRFs for sequence labeling. We propose a novel (approximate) inference algorithm based on iterative Bregman projections which can take full advantage of modern GPUs.

caio-corro.fr/pdf/bregman_...
June 2, 2025 at 10:34 AM
Computer Science is not welcomed at ACL. Best they can do is "engineering experiments". If you do ML research, probably not welcomed either.
February 14, 2025 at 5:26 PM