Peter Bloem
banner
pbloem.sigmoid.social.ap.brid.gy
Peter Bloem
@pbloem.sigmoid.social.ap.brid.gy
Assistant prof. at the Learning and Reasoning group, Vrije Universiteit Amsterdam (searchable).

[bridged from https://sigmoid.social/@pbloem on the fediverse by https://fed.brid.gy/ ]
Uh oh.
November 22, 2025 at 9:31 AM
I don't post my runs very often, but this one was pretty nice. 20k on the Vliehors, the largest sand plain in Europe. I was a little late, so I spent most of the time running in the sunset.

Met some seals on the way.
November 9, 2025 at 5:27 PM
November 7, 2025 at 11:17 AM
As an epilogue, here is the proof of the main theorem with my annotations.

As proofs go it's pretty simple, mostly building on set theory and some juggling of inequalities.

The key structure is given above the heading: start with the statement of the […]

[Original post on sigmoid.social]
October 28, 2025 at 5:32 PM
We don't need to change the benchmarks, we simply need to change the grading, and adapt the prompt.

First, we pick some confidence level t (the probability the models assigns to it being correct). Then, we say: answer the question or abstain from answering […]

[Original post on sigmoid.social]
October 28, 2025 at 5:22 PM
The argument is simply that all evaluations we use to rank LLMs—whether different versions of our own or top models from different labs—use binary grading.

Like in an exam, you're either right or wrong, and if you're wrong you get zero points. An in exam […]

[Original post on sigmoid.social]
October 28, 2025 at 5:16 PM
Why does this happen? The proof doesn't give me much intuition despite its simplicity. But the discussion on calibration elucidated a lot for me.

"Calibration" refer to the ability of a network to correctly represent its own uncertainty. A well calibrated […]

[Original post on sigmoid.social]
October 28, 2025 at 4:51 PM
The general theorem extends this slightly. In this setting, we allow X to consist of prompts c with a set of valid and erroneous responses r. The instances E and V are now pairs (c,r). Filtering the sets E and V by prompts gives us the subsets E_c and V_c […]

[Original post on sigmoid.social]
October 28, 2025 at 4:38 PM
Here's how the classifier (f hat) is defined. They just needs something that does worse than the optimal classifier for the argument, but it's actually a pretty intuitive approach.

The classifier looks at the probability that our language model (p hat) […]

[Original post on sigmoid.social]
October 28, 2025 at 4:33 PM
The argument goes like this. Let p be some language model pre-trained just on V. It has seen only valid examples and is thus minimally likely to generate things from E.

Call the probability that it generates something from E "err". This is roughly our […]

[Original post on sigmoid.social]
October 28, 2025 at 4:25 PM
Why are we bothering about classification, when we are worried about generative models? Because the presented result shows that the we can lowerbound the probability of hallucination (generating examples from E) by the probability that some classiffier […]

[Original post on sigmoid.social]
October 28, 2025 at 4:20 PM
At this point you may be looking for a strict definition of "hallucination". As it turns out, for the argument of the paper, we don't need anything very precise. We just need to assume that someone like OpenAI has collected a large dataset of desirable and […]

[Original post on sigmoid.social]
October 28, 2025 at 4:05 PM
This doesn't just apply to failures to retrieve information it doesn't know about. For tasks like letter counting, which are difficult for LLMs, incorrect answers are also considered potential hallucinations.

This is not something the model is expected to […]

[Original post on sigmoid.social]
October 28, 2025 at 4:00 PM
With that, let's start at the beginning. They open with a simple way to elicit a hallucination: ask the model for your birthday, and ask it to reply with just the date, but only if it knows.

If you try this on a relatively raw, open model like DeepSeek-V3 […]

[Original post on sigmoid.social]
October 28, 2025 at 3:53 PM
Let's do a deep dive into this paper: "Why Language Models Hallucinate."

When this came out, many people's summary was "even OpenAI admits that hallucinations are a fundamental problem of transformers/autoregressive models/LLMs."

I've seen many people […]

[Original post on sigmoid.social]
October 28, 2025 at 3:41 PM
The patient man's loss curve.
September 3, 2025 at 7:16 PM
Well done AI for bagsying humans with the Chinese room.
August 21, 2025 at 3:11 PM
Whoever is responsible for this should not have chosen a career in IT.

Whoever is responsible for this should have had a career in staying out of the way.
August 19, 2025 at 6:32 PM
Two official heatwaves per year is rare, according to the Dutch news. It's only happened before in 1941, 2006, 2018 and 2019.

Call me pessimistic, but looking at that sequence, I'd say it _used to_ be rare.
August 15, 2025 at 1:08 PM
Here's an odd effect (stumbled on by accident). The blue loss curve is from a well-tuned BERT baseline (from the "cramming"paper).

The only thing I changed for the orange is to put a residual connection around each transformer block and to multiplier the […]

[Original post on sigmoid.social]
July 27, 2025 at 3:10 PM
Oh, come on...

Can we please not make our cyclist-ridden country full of strange and untypical streets the testing ground for a manchild's misguided attempts at creating a technology he doesn't understand with a vast societal risk he doesn't respect.
July 24, 2025 at 2:42 PM
Now out in# TMLR:

🍇 GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks 🍇

There's lots of work on sampling subgraphs for GNNs, but relatively little on making this sampling process _adaptive_. That is, learning to select the data from the […]

[Original post on sigmoid.social]
July 18, 2025 at 9:26 AM
I have long mentally muted any hype about new optimizers, but this Muon/MuonClip seems to be the real deal...

I'll have to dig into the details at some point. It seems that they ideas are a bit more complex than AdamW, which is a shame. Still, the performance […]

[Original post on sigmoid.social]
July 12, 2025 at 11:41 AM
July 11, 2025 at 12:47 PM
When duckduckgo really comes through for you and your deeply unreliable memory...
July 9, 2025 at 6:06 PM