Andrés Corrada
andrescorrada.bsky.social
Andrés Corrada
@andrescorrada.bsky.social
Scientist, Inventor, author of the NTQR Python package for AI safety through formal verification of unsupervised evaluations. On a mission to eliminate Majority Voting from AI systems. E Pluribus Unum.
Pinned
Plato famously critiqued democracy as a way to run the Ship of State. His critique is relevant to AI safety. Can we be safe with ensembles of AI agents? This NotebookLM conversation discusses how the crowd can be wiser via first evaluating itself, then deciding youtu.be/oDJdpTvhY_4
Plato's Ship of Fools Allegory and AI Safety: A conversation
YouTube video by Andrés Corrada-Emmanuel
youtu.be
Not true. Can we stop saying that building models of what you call "reasoning" and then showing that LLMs correlate with your ad-hoc definition, is not, in any way proof that these models "reason" as humans. Computational neuro-science by newbies is not science.
if this is true, i think you should expect Gemini to soon dominate

we keep getting more and more confirmation that reasoning begins in pre-training

today’s evidence: arxiv.org/abs/2510.07364

maybe Gemini 3 is the tidal shift where Google gains a permanent lead
October 13, 2025 at 12:40 PM
I'm working on visual demonstrations of the logical computations for the M=2 axioms of unsupervised evaluation for classifiers. This example is for 3 labels (a, b, c). Each illustration shows the decomposition that must be true for any pair of classifiers given assumed number of label questions.
October 13, 2025 at 12:26 PM
For three or more labels, there are more error modes than correct responses. So there are many ways to be X% correct on any given label. In binary classification, there is only one. This is shown here for three label classification by three LLMs-as-Judges grading the output of two other LLMs.
October 13, 2025 at 11:56 AM
Reposted by Andrés Corrada
October 10, 2025 at 10:00 PM
The central question in any logic of unsupervised evaluation is -- what are the group evaluations logically consistent with how we observe experts agreeing/disagreeing on a test for which we have no answer key? As this series of plots for a single binary classifier that labeled (Q=10) items shows.
October 10, 2025 at 6:00 PM
Any dialogue framework would benefit from understanding the logic of unsupervised evaluation - what are the group evaluations logically consistent with how we observe experts agreeing/disagreeing in their decisions? Evaluate, then collectively decide.
October 10, 2025 at 12:29 PM
Reposted by Andrés Corrada
Gonna pump my own paper here which is basically saying it’s the wrong problem to focus on for making good decisions in medicine 😅

www.sciencedirect.com/science/arti...
Explaining decisions without explainability? Artificial intelligence and medicolegal accountability
www.sciencedirect.com
October 10, 2025 at 1:34 AM
Someday it will seem strange that seminars on the theoretical foundations of machine learning did not discuss the logic of unsupervised evaluation and its axioms. Simple linear relations that anyone with knowledge of high-school math can comprehend. arxiv.org/abs/2510.00821
October 9, 2025 at 5:57 PM
Evaluation is not the same as decision. This is demonstrated here in two ways.
1. "And you can never test whether your data was generated in the iid model, nor can you test if it will be generated in the iid model tomorrow."
You can actually do the first part with evaluations but not the second.
Almost a decade ago, I coauthored a paper asking us to rethink our theory of generalization in machine learning. Today, I’m fine putting the theory back on the shelf.
Reshelving generalization
You don't need a theorem to argue more data is better than less data
www.argmin.net
October 9, 2025 at 5:00 PM
Logical consistency between disagreeing experts can be used to keep us safer when we use their noisy decisions. My latest on this overlooked tool to tame AIs -- submitted to IEEE SaTML 2026 in Berlin. arxiv.org/abs/2510.00821
October 2, 2025 at 12:08 PM
Formalization is considered the sine qua non of maturity in many fields. How far could you go with your work, @far.ai , if you understood and used the logic of unsupervised evaluation I have developed? How far could you inspire others? Anybody can try this logic. ntqr.readthedocs.io/en/latest
October 1, 2025 at 9:12 PM
And we need @far.ai to understand that the verification of evaluations of classifiers exists. This is my work on the logic of unsupervised evaluation for classifiers. So simple anybody who understands high-school math will get it. ntqr.readthedocs.io/en/latest
October 1, 2025 at 9:03 PM
Reposted by Andrés Corrada
Live from "war ravaged" Portland
September 27, 2025 at 9:13 PM
Reposted by Andrés Corrada
September 28, 2025 at 3:00 PM
Reposted by Andrés Corrada
@adolfont.github.io, it may interest you to know that we can take concepts from theorem provers to formalize the computation of the group evaluations that are logically consistent with how we observe experts, human or robotic, agreeing/disagreeing on a test. ntqr.readthedocs.io/en/latest
September 28, 2025 at 2:30 PM
Reposted by Andrés Corrada
Speaking of boxes of numbers. Here are the possible evaluations for a single binary classifier that labeled Q=10 items (left). If you understand that evaluations of classifiers obey linear axioms, you can restrict this space of 285 possible evaluations to just 35 after seeing a summary their labels.
September 25, 2025 at 3:56 PM
Reposted by Andrés Corrada
I spent the week at the police surveillance convention and let me tell you my biggest observation: The name of the game now is consolidating as much information as humanely possible from surveillance devices, the internet, other governmental data, and literally a million other places. 🧵
September 26, 2025 at 4:47 PM
Although I claim to have discovered that there is such a thing as a "logic of unsupervised evaluation for classifiers", I am not the first to use logic to derive equations that must be universally true for all classifiers in all domains. That was done by Platanaios, Blum and Mitchell.
September 26, 2025 at 6:01 PM
I had a very small role to play in the initial spread of Python when I worked at Dragon Systems 1990s along with Tim Peters in Newton, MA.
Tim was a software engineer, I was a scientist in the research group where most code was written in Perl. I loved Perl, I quickly learned to use it and still do,
September 26, 2025 at 5:15 PM
Reposted by Andrés Corrada
🚀 We’re thrilled to announce the upcoming AI & Scientific Discovery online seminar! We have an amazing lineup of speakers.

This series will dive into how AI is accelerating research, enabling breakthroughs, and shaping the future of research across disciplines.

ai-scientific-discovery.github.io
September 25, 2025 at 6:28 PM
Reposted by Andrés Corrada
My personal belief is that marketing in the aggregate, via statistical measures of groups, that respects the privacy of individuals via anonymous algorithms can allow us to have our cake and eat it too.
September 26, 2025 at 1:10 PM
Logical consistency between disagreeing experts can be a powerful tool for understanding how correct they could possibly be.
September 26, 2025 at 11:27 AM
Reposted by Andrés Corrada
It is astonishing to me that the AI research community is ignorant of even the axioms of evaluation for single classifiers, never mind for pairs, trios, etc.
September 25, 2025 at 4:06 PM
I'm adding an appendix to my recent paper discussing the M=2 axioms for pairs of classifiers for R labels. The paper focuses on the M=1 axioms, the single R-labels classifier.
But as the paper points out, M=2 logically possible evaluations must be a subset of the product of their M=1 evaluations!
September 26, 2025 at 10:58 AM
Reposted by Andrés Corrada
Why is autism really on the rise? What the science says - As Trump blames Tylenol, Nature looks into the decades of research on the causes of autism. www.nature.com/articles/d41...
Why is autism really on the rise? What the science says
As Trump blames Tylenol, Nature looks into the decades of research on the causes of autism.
www.nature.com
September 25, 2025 at 3:23 PM