Lightnews — Scholar-powered news

Andrés Corrada

@andrescorrada.bsky.social

360 followers 600 following 1.4K posts

Scientist, Inventor, author of the NTQR Python package for AI safety through formal verification of unsupervised evaluations. On a mission to eliminate Majority Voting from AI systems. E Pluribus Unum.

Posts Replies Media Videos

Pinned

Andrés Corrada @andrescorrada.bsky.social · Nov 30

Plato famously critiqued democracy as a way to run the Ship of State. His critique is relevant to AI safety. Can we be safe with ensembles of AI agents? This NotebookLM conversation discusses how the crowd can be wiser via first evaluating itself, then deciding youtu.be/oDJdpTvhY_4

Plato's Ship of Fools Allegory and AI Safety: A conversation

YouTube video by Andrés Corrada-Emmanuel

youtu.be

Andrés Corrada

@andrescorrada.bsky.social

Not true. Can we stop saying that building models of what you call "reasoning" and then showing that LLMs correlate with your ad-hoc definition, is not, in any way proof that these models "reason" as humans. Computational neuro-science by newbies is not science.

Tim Kellogg @timkellogg.me · Oct 13

if this is true, i think you should expect Gemini to soon dominate

we keep getting more and more confirmation that reasoning begins in pre-training

today’s evidence: arxiv.org/abs/2510.07364

maybe Gemini 3 is the tidal shift where Google gains a permanent lead

October 13, 2025 at 12:40 PM

Andrés Corrada

@andrescorrada.bsky.social

I'm working on visual demonstrations of the logical computations for the M=2 axioms of unsupervised evaluation for classifiers. This example is for 3 labels (a, b, c). Each illustration shows the decomposition that must be true for any pair of classifiers given assumed number of label questions.

For 3-labels, we can write the count of 'a' items as equal to a sum over all the pairs possible responses given 'a' label. This is visually demonstrated as three squares, one for each label. We sum squares that are not transparent.

Same as the 'a' label picture but now for the 'b' label.

Same as the 'a' label picture, but now for the 'c' label.

October 13, 2025 at 12:26 PM

Andrés Corrada

@andrescorrada.bsky.social

For three or more labels, there are more error modes than correct responses. So there are many ways to be X% correct on any given label. In binary classification, there is only one. This is shown here for three label classification by three LLMs-as-Judges grading the output of two other LLMs.

Possible evaluations of being correct on three labels by 2 LLMs-as-Judges. The z-axis shows the multiplicity of evaluations at their joint correct evaluation. For any number of corrects they get, there are many ways ways to be wrong.

October 13, 2025 at 11:56 AM

Reposted by Andrés Corrada

The Onion

@theonion.com

ICE Raids: Myth Vs. Fact https://theonion.com/ice-raids-myth-vs-fact/

MYTH: My grandmother was deported to El Salvador.

FACT: No one really knows where she was deported to.

October 10, 2025 at 10:00 PM

Andrés Corrada

@andrescorrada.bsky.social

The central question in any logic of unsupervised evaluation is -- what are the group evaluations logically consistent with how we observe experts agreeing/disagreeing on a test for which we have no answer key? As this series of plots for a single binary classifier that labeled (Q=10) items shows.

October 10, 2025 at 6:00 PM

Andrés Corrada

@andrescorrada.bsky.social

Any dialogue framework would benefit from understanding the logic of unsupervised evaluation - what are the group evaluations logically consistent with how we observe experts agreeing/disagreeing in their decisions? Evaluate, then collectively decide.

October 10, 2025 at 12:29 PM

Reposted by Andrés Corrada

Melissa McCradden

@mdmccradden.bsky.social

Gonna pump my own paper here which is basically saying it’s the wrong problem to focus on for making good decisions in medicine 😅

www.sciencedirect.com/science/arti...

Explaining decisions without explainability? Artificial intelligence and medicolegal accountability

www.sciencedirect.com

October 10, 2025 at 1:34 AM

Andrés Corrada

@andrescorrada.bsky.social

Someday it will seem strange that seminars on the theoretical foundations of machine learning did not discuss the logic of unsupervised evaluation and its axioms. Simple linear relations that anyone with knowledge of high-school math can comprehend. arxiv.org/abs/2510.00821

October 9, 2025 at 5:57 PM

Andrés Corrada

@andrescorrada.bsky.social

Evaluation is not the same as decision. This is demonstrated here in two ways.
1. "And you can never test whether your data was generated in the iid model, nor can you test if it will be generated in the iid model tomorrow."
You can actually do the first part with evaluations but not the second.

Ben Recht @beenwrekt.bsky.social · Oct 9

Almost a decade ago, I coauthored a paper asking us to rethink our theory of generalization in machine learning. Today, I’m fine putting the theory back on the shelf.

Reshelving generalization

You don't need a theorem to argue more data is better than less data

www.argmin.net

October 9, 2025 at 5:00 PM

Andrés Corrada

@andrescorrada.bsky.social

Logical consistency between disagreeing experts can be used to keep us safer when we use their noisy decisions. My latest on this overlooked tool to tame AIs -- submitted to IEEE SaTML 2026 in Berlin. arxiv.org/abs/2510.00821

Logical Consistency Between Disagreeing Experts and Its Role in AI Safety

October 2, 2025 at 12:08 PM

Andrés Corrada

@andrescorrada.bsky.social

Formalization is considered the sine qua non of maturity in many fields. How far could you go with your work, @far.ai , if you understood and used the logic of unsupervised evaluation I have developed? How far could you inspire others? Anybody can try this logic. ntqr.readthedocs.io/en/latest

October 1, 2025 at 9:12 PM

Andrés Corrada

@andrescorrada.bsky.social

And we need @far.ai to understand that the verification of evaluations of classifiers exists. This is my work on the logic of unsupervised evaluation for classifiers. So simple anybody who understands high-school math will get it. ntqr.readthedocs.io/en/latest

October 1, 2025 at 9:03 PM

Reposted by Andrés Corrada

Tim Dickinson

@timdickinson.bsky.social

Live from "war ravaged" Portland

September 27, 2025 at 9:13 PM

Reposted by Andrés Corrada

The Onion

@theonion.com

Sumerians Look On In Confusion As God Creates World https://theonion.com/sumerians-look-on-in-confusion-as-god-creates-world-1819571221/

September 28, 2025 at 3:00 PM

Reposted by Andrés Corrada

Andrés Corrada

@andrescorrada.bsky.social

@adolfont.github.io, it may interest you to know that we can take concepts from theorem provers to formalize the computation of the group evaluations that are logically consistent with how we observe experts, human or robotic, agreeing/disagreeing on a test. ntqr.readthedocs.io/en/latest

September 28, 2025 at 2:30 PM

Reposted by Andrés Corrada

Andrés Corrada

@andrescorrada.bsky.social

Speaking of boxes of numbers. Here are the possible evaluations for a single binary classifier that labeled Q=10 items (left). If you understand that evaluations of classifiers obey linear axioms, you can restrict this space of 285 possible evaluations to just 35 after seeing a summary their labels.

September 25, 2025 at 3:56 PM

Reposted by Andrés Corrada

Matthew Guariglia

@mguariglia.bsky.social

I spent the week at the police surveillance convention and let me tell you my biggest observation: The name of the game now is consolidating as much information as humanely possible from surveillance devices, the internet, other governmental data, and literally a million other places. 🧵

September 26, 2025 at 4:47 PM

Andrés Corrada

@andrescorrada.bsky.social

Although I claim to have discovered that there is such a thing as a "logic of unsupervised evaluation for classifiers", I am not the first to use logic to derive equations that must be universally true for all classifiers in all domains. That was done by Platanaios, Blum and Mitchell.

September 26, 2025 at 6:01 PM

Andrés Corrada

@andrescorrada.bsky.social

I had a very small role to play in the initial spread of Python when I worked at Dragon Systems 1990s along with Tim Peters in Newton, MA.
Tim was a software engineer, I was a scientist in the research group where most code was written in Perl. I loved Perl, I quickly learned to use it and still do,

Jeffrey Perkel (he/him) @jperkel.bsky.social · Sep 26

"NumPy has been a driving force for my whole life." It was a real pleasure to speak with @numpy.bsky.social developer Travis Oliphant about NumPy, SciPy, and #Python: the Documentary! New @nature.com 🧪🐍 www.nature.com/articles/d41...

Python, the movie! The origin story of the programming language comes to the silver screen

The creator of the NumPy and SciPy libraries reflects on their supporting role in the story of Python, now the subject of a documentary.

www.nature.com

September 26, 2025 at 5:15 PM

Reposted by Andrés Corrada

chenhaotan.bsky.social

@chenhaotan.bsky.social

🚀 We’re thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers.

This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.

ai-scientific-discovery.github.io

September 25, 2025 at 6:28 PM

Reposted by Andrés Corrada

Andrés Corrada

@andrescorrada.bsky.social

My personal belief is that marketing in the aggregate, via statistical measures of groups, that respects the privacy of individuals via anonymous algorithms can allow us to have our cake and eat it too.

September 26, 2025 at 1:10 PM

Andrés Corrada

@andrescorrada.bsky.social

Logical consistency between disagreeing experts can be a powerful tool for understanding how correct they could possibly be.

September 26, 2025 at 11:27 AM

Reposted by Andrés Corrada

Andrés Corrada

@andrescorrada.bsky.social

It is astonishing to me that the AI research community is ignorant of even the axioms of evaluation for single classifiers, never mind for pairs, trios, etc.

September 25, 2025 at 4:06 PM

Andrés Corrada

@andrescorrada.bsky.social

I'm adding an appendix to my recent paper discussing the M=2 axioms for pairs of classifiers for R labels. The paper focuses on the M=1 axioms, the single R-labels classifier.
But as the paper points out, M=2 logically possible evaluations must be a subset of the product of their M=1 evaluations!

September 26, 2025 at 10:58 AM

Reposted by Andrés Corrada

Johns Hopkins Berman Institute of Bioethics

@bermaninstitute.bsky.social

Why is autism really on the rise? What the science says - As Trump blames Tylenol, Nature looks into the decades of research on the causes of autism. www.nature.com/articles/d41...

Why is autism really on the rise? What the science says

As Trump blames Tylenol, Nature looks into the decades of research on the causes of autism.

www.nature.com

September 25, 2025 at 3:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news