Gaël Varoquaux
banner
gaelvaroquaux.bsky.social
Gaël Varoquaux
@gaelvaroquaux.bsky.social
Research & code: Research director @inria
►Data, Health, & Computer science
►Python coder, (co)founder of scikit-learn, joblib, & @probabl.bsky.social
►Sometimes does art photography
►Physics PhD
One of my collaborator sending me a @skrub-data.bsky.social TableReport as an HTML file, with which I can interact, and explore the data, to give him feedback.
Ideal workflow, as far as I am concerned: async, yet interactive, and not needing an infrastructure
October 23, 2025 at 8:10 PM
The full text is here: I kept it short, but it is deeply meaningful to me
gael-varoquaux.info/personnal/a-...
October 10, 2025 at 11:37 AM
Let's keep in mind that we can redefine what is cool, and not play in others' game.

Define what we're proud of:

Bigger is not better
Simplicity is a virtue
Tech for the many
October 1, 2025 at 4:42 PM
Clouds natural bring improved infrastructure. But also enable spying and control.

We need to be careful whom we platform. Tech lords have sometimes the wrong political connections.
October 1, 2025 at 4:42 PM
More efficient computing won't suffice.

Efficiency improvements are super useful. But demand will increase more, and catch up. Such a rebound effect is very classic with technology, eg with transportation or energy.

It's really the behaviors that condition resource usage (eg bike > SUV)
October 1, 2025 at 4:42 PM
The good news is: tech keeps improving.
Software and algorithms keep getting better, as well as large compute and data infrastructures.
October 1, 2025 at 4:42 PM
There are financial considerations indeed. AI actors are burning a crazy amount of money.

But high costs are not always a bad thing (if you own nvidia stock)
October 1, 2025 at 4:42 PM
It's thanks to Moore's law, right? Computation is getting more effective....

Well, the cost has been exploding (exponentially indeed).

So it's really about pouring more and more money
October 1, 2025 at 4:42 PM
The story is that we're getting there by waiting for faster GPUs and bigger datasets

and indeed, the compute used has explode, in a super-exponential growth, going way beyond the daily compute of the biggest computers
October 1, 2025 at 4:42 PM
It's cool because is promises really amazing, very great, awesome productivity gains

(look at those studies by microsoft, IBM, Google...)
October 1, 2025 at 4:42 PM
So, what's cool in tech?

Well, AI is cool...

it's all over the news, the people on the pictures look healthy and happy (and also white and male), and there is always a big amount of dollars associated
October 1, 2025 at 4:42 PM
A normative framework is the set of implicit rules and values that define the normal

What is "normal" is cultural by nature
October 1, 2025 at 4:42 PM
Come to my lightning talk
At @pydataparis.bsky.social in a few minutes
October 1, 2025 at 3:06 PM
Plot performance (a reasonable measure of) as a function of size/cost.

It helps moving the discussion away from a single number
September 18, 2025 at 6:51 PM
The references that underpin this vision
September 2, 2025 at 1:04 PM
We need new relational tools, using continuous representations, and modeling multiple tables jointly.

LLMs don't suffice: treating numbers as tokens is an insult to their topology.

Many questions are statistical at heart, and not about finding the "correct" match in a database.
September 2, 2025 at 12:14 PM
But relational data calls for entity matching, or joins and aggregations across multiple tables.

Optimizing such a pipeline for a statistical goal leads to discrete combinatorial optimization, which is intractable.
September 2, 2025 at 9:28 AM
We need tools that are given a statistical goal, and optimize/learn the data assembly
September 2, 2025 at 8:58 AM
Speaking on panel @VLDB at 10:45

My take: we need to rethink relational tools for analysis, for instance, estimating typical property price from rich data.

The challenge is specifying the formula and not executing a query
September 2, 2025 at 8:57 AM
Hum, I got a somewhat similar one, though in French.
Sharing so that you can check how similar the content is.
August 23, 2025 at 1:35 PM
I really hope that this piece can be helpful as a solid introduction to people with causal questions.

Review co-authored with Judith Abecassis, Julie Alberge, and Elise Dumas.
August 20, 2025 at 7:12 PM
We give not only theory of the machine-learning estimators, but also practical details
August 20, 2025 at 7:12 PM
We give both rigorous mathematical explanations, and intuition ones, for instance here around identifiability: how causal effects can be, under certain conditions, recovered from data
August 20, 2025 at 7:12 PM
Our didactic review on machine learning for causal inference, now open access:
• identifiability (theory of when the data can answer a causal question)
• machine-learning estimators
• study design (asking well-framed questions + loopholes, eg with timewise data)
www.annualreviews.org/content/jour...
August 20, 2025 at 7:12 PM
Thank you, Indonesia, for a wonderful holiday.

Generous and kind people, amazing heritage, gorgeous landscapes
August 20, 2025 at 1:02 PM