Lightnews — Scholar-powered news

Andrés Corrada

@andrescorrada.bsky.social

By this paper's argument, no machine can be bigger, than say, ants. It reads as one extended philosophical joke by an academic arm of The Onion.

October 24, 2025 at 5:47 PM

Andrés Corrada

@andrescorrada.bsky.social

Sometimes they follow the format, sometimes not. Practical result - automatic parsing of LLM responses are almost impossible using regular expressions. As a consequence, it throttles the number of questions in an experiment to what you can tolerate manually processing!

October 17, 2025 at 11:27 AM

Andrés Corrada

@andrescorrada.bsky.social

"With the right guardrails", @andytseng.bsky.social , agreed. How do you see this playing out when currently LLMs have a hard time following formatting instructions for their answers? I'm running "AI debate" experiments asking LLMs to put their answer at the end of the reasons for it. LLMs say: Meh.

October 17, 2025 at 11:27 AM

Andrés Corrada

@andrescorrada.bsky.social

assuming the conclusion. At best, you are proposing there is an "inside joke" that is not expressed. Sorry. This is sloppy science by people that have not bothered to study the hard work of psychologists and neuroscientists.

October 13, 2025 at 1:10 PM

Andrés Corrada

@andrescorrada.bsky.social

"Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasoning capabilities or repurpose pre-existing base model ones." The logical fallacy here is ...

October 13, 2025 at 1:10 PM

Andrés Corrada

@andrescorrada.bsky.social

count given true label. Yes, there are, algebraically they are in the appendix of my paper. The illustration at the top of the post is the 2nd term in these equations, Q_label_true. So the full visual proof is going to need illustrations for every one of those sums in the very same three squares.

October 13, 2025 at 12:26 PM

Andrés Corrada

@andrescorrada.bsky.social

In general, the R-label classification test would have R^2 squares. Here we have 3^2=9 cells for 3-label classification. The existence of M=2 axioms for any number of labels, R, then comes down to asserting that for any R, there are a set of finite geometric operations that give you the pair correct

October 13, 2025 at 12:26 PM

Andrés Corrada

@andrescorrada.bsky.social

If we are at (Q_a=8,Q_b=9,Q_c=8) for a 3-label test, and observe a classifier saying (R_a=3,R_b=10,R_c=12) you can start parsing out the logical consequences. There is no way they can be better than 75% on the c-label, e.g. All of those bespoke deductions are the M=1 axioms. arxiv.org/abs/2510.00821

October 13, 2025 at 11:56 AM

Andrés Corrada

@andrescorrada.bsky.social

Given an assumed Q-point, what are the evaluations possible for each classifier? All we have in an unlabeled setting are their observed response counts. So for every classifier, we have, if you will, their own estimate of the true Q-point: (R_a_i, R_b_i, R_c_i). Logical consistency now comes in.

October 13, 2025 at 11:56 AM

Andrés Corrada

@andrescorrada.bsky.social

So all computations of possible evaluations given observations occur at the same fixed Q-point. Note that the logic cannot tell us anything about what this Q-point could be. Other knowledge or assumptions must be invoked to do that - science and engineering.
The M=1 axioms then come into play.

October 13, 2025 at 11:56 AM

Andrés Corrada

@andrescorrada.bsky.social

The plot also illustrates the "atomic" logical operations in unsupervised evaluation. We cannot know anything about the answer key. Hence, all possible values of a statistic of the answer key must be possible. For three labels, these are the points (Q_a, Q_b, Q_c). Q_a = number of 'a' questions, etc

October 13, 2025 at 11:56 AM

Andrés Corrada

@andrescorrada.bsky.social

This diagram also reveals that the logic is not trivial. It reveals structure about possible evaluations given observed counts of how classifiers agree/disagree. This is, in effect, a purely logical density of states plot! And there is no probability involved. Only algebra.

October 13, 2025 at 11:56 AM

Andrés Corrada

@andrescorrada.bsky.social

I've come to accept that both statements could be true. It is amazing what we can learn when we set out to do it, but our ignorance remains vast after our best efforts.

October 13, 2025 at 11:29 AM

Andrés Corrada

@andrescorrada.bsky.social

I would include Plato's Ship of Fools Allegory as an example of the wisdom of the crowd and its critics. But also about the principal/agent problem in AI safety when we delegate to agents tasks that we are ignorant about or don't want to do. medium.com/@andrescorra...

Plato, robots and the democratic mob

The attacks on democracy started as soon as it was born in Ancient Greece. The most famous attack remains Plato’s “The Republic”. In it he…

medium.com

October 11, 2025 at 10:36 PM

Andrés Corrada

@andrescorrada.bsky.social

The possible set of evaluations starts with this space that summarizes the answer key -- (Q_a, Q_b, Q_c,..) of dimension equal to the number of labels. But it is finite inside that space. Crucially, all the statistics are integers between zero and some observable for the classifiers also.

October 10, 2025 at 6:00 PM

Andrés Corrada

@andrescorrada.bsky.social

If there is an answer key it maps to a point in the space defined by (Q_a, Q_b,...) for the R labels in classification. Nothing in the logic can tell us what that point is just from observing test responses from experts. However, we have digitized our uncertainty of the answer key to a finite set.

October 10, 2025 at 6:00 PM

Andrés Corrada

@andrescorrada.bsky.social

Given counts of how classifiers agree/disagree on a finite test of size Q, you can express those observations as linear transformations from a space of the unknown statistics of the answer key and the correctness/errors of the classifiers.
Let's start with the answer key.

October 10, 2025 at 6:00 PM

Andrés Corrada

@andrescorrada.bsky.social

In general, knowing and applying the M=1 axioms for R-label classification gives you a reduction in uncertainty that goes as (1 - 1 / Q^(R-1)). The M=1 axioms are explained in my latest paper arxiv.org/abs/2510.00821

October 10, 2025 at 6:00 PM

Andrés Corrada

@andrescorrada.bsky.social

If you listen to Carlini's earlier talks on adversarial images this is not his tone. He adamantly stated that no safety was possible given his work and that of others. He never acknowledges that it isn't applied. His warnings of doom now ring equally hollow to me. youtu.be/umfeF0Dx-r4?...

Nicholas Carlini – Some Lessons from Adversarial Machine Learning

YouTube video by FAR․AI

youtu.be

October 10, 2025 at 12:37 PM

Andrés Corrada

@andrescorrada.bsky.social

In any debate, it would help if the participants had some measure of the reliability of the opinions of others and of their own. In addition, logical consistency can act as a way to warn us that at least one member of the debate is actually violating our accuracy specification.

October 10, 2025 at 12:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news