Thomas Steinke
@stein.ke
Computer science, math, machine learning, (differential) privacy
Researcher at Google DeepMind
Kiwi🇳🇿 in California🇺🇸
http://stein.ke/
Researcher at Google DeepMind
Kiwi🇳🇿 in California🇺🇸
http://stein.ke/
This is crazy, but I like it.
November 6, 2025 at 4:31 AM
This is crazy, but I like it.
Was this near San Francisco? I saw something from Menlo Park.
I'm searching bsky to see if anyone else saw what I saw.
I'm searching bsky to see if anyone else saw what I saw.
November 4, 2025 at 2:10 AM
Was this near San Francisco? I saw something from Menlo Park.
I'm searching bsky to see if anyone else saw what I saw.
I'm searching bsky to see if anyone else saw what I saw.
I totally empathize with the frustration of AI not working for a lot of the use cases where it is being touted. But I've seen it work often enough that I just don't think it's tenable to say it has no clear use.
November 3, 2025 at 9:01 PM
I totally empathize with the frustration of AI not working for a lot of the use cases where it is being touted. But I've seen it work often enough that I just don't think it's tenable to say it has no clear use.
In each of the aforementioned use cases (except maybe the last one), I'm ultimately able to validate the output. I think such use cases are fine. But I'm not yet willing to trust AI to do things autonomously.
November 3, 2025 at 8:53 PM
In each of the aforementioned use cases (except maybe the last one), I'm ultimately able to validate the output. I think such use cases are fine. But I'm not yet willing to trust AI to do things autonomously.
To be concrete, I've personally found AI useful for proofreading my writing, writing/debugging code, editing photos, suggesting proof approaches, and language translation. It's clearly not useless. But I also see it generate nonsense half the time, so I'm aware of its limitations.
November 3, 2025 at 8:48 PM
To be concrete, I've personally found AI useful for proofreading my writing, writing/debugging code, editing photos, suggesting proof approaches, and language translation. It's clearly not useless. But I also see it generate nonsense half the time, so I'm aware of its limitations.
Sorry, I don't want to pick a fight here, but the lack of nuance in the discourse around AI is really disappointing.
Yes, it's overhyped. No, it's not useless.
Yes, it's overhyped. No, it's not useless.
November 3, 2025 at 8:41 PM
Sorry, I don't want to pick a fight here, but the lack of nuance in the discourse around AI is really disappointing.
Yes, it's overhyped. No, it's not useless.
Yes, it's overhyped. No, it's not useless.
This is what \nocite{} is for.
October 20, 2025 at 4:19 PM
This is what \nocite{} is for.
This is the "don't bring your kids along" policy.
October 15, 2025 at 11:18 PM
This is the "don't bring your kids along" policy.
IMHO, the best analog to the AI bubble is the dotcom bubble. Yes, the internet proved to be economically transformative, but there was still a bubble. Companies made a lot of money in the end, but it wasn't necessarily the ones that people expected -- e.g., see CISCO:
October 9, 2025 at 1:17 AM
IMHO, the best analog to the AI bubble is the dotcom bubble. Yes, the internet proved to be economically transformative, but there was still a bubble. Companies made a lot of money in the end, but it wasn't necessarily the ones that people expected -- e.g., see CISCO:
The final piece of the puzzle is how do you choose the subsamples? Here's where having a relative who knows combinatorics was helpful. 😁 Basically, the subsets should form a covering design. And the minimal size of a covering design parameterizes the tradeoff.
October 4, 2025 at 4:50 PM
The final piece of the puzzle is how do you choose the subsamples? Here's where having a relative who knows combinatorics was helpful. 😁 Basically, the subsets should form a covering design. And the minimal size of a covering design parameterizes the tradeoff.
OK, so can we get the best of both worlds? or at least trade off between the cost of privacy in terms of accuracy/data and the number of subsamples we need to evaluate on?
That's what we address in our new paper. The answer is yes, but the tradeoff is quite steep (and we have a lower bound).
That's what we address in our new paper. The answer is yes, but the tradeoff is quite steep (and we have a lower bound).
Privately Estimating Black-Box Statistics
Günter F. Steinke, Thomas Steinke
http://arxiv.org/abs/2510.00322
Günter F. Steinke, Thomas Steinke
http://arxiv.org/abs/2510.00322
October 4, 2025 at 4:28 PM
OK, so can we get the best of both worlds? or at least trade off between the cost of privacy in terms of accuracy/data and the number of subsamples we need to evaluate on?
That's what we address in our new paper. The answer is yes, but the tradeoff is quite steep (and we have a lower bound).
That's what we address in our new paper. The answer is yes, but the tradeoff is quite steep (and we have a lower bound).
In this paper we showed that we can do this kind of versatile black-box estimation with only an *additive* cost of privacy, where sample-and-aggregate suffers a *multiplicative* cost. But this uses exponentially many subsamples - i.e., all subsets of size n-t instead of a partition into t parts.
Privately Evaluating Untrusted Black-Box Functions
Ephraim Linder, Sofya Raskhodnikova, Adam Smith, Thomas Steinke
http://arxiv.org/abs/2503.19268
Ephraim Linder, Sofya Raskhodnikova, Adam Smith, Thomas Steinke
http://arxiv.org/abs/2503.19268
October 4, 2025 at 4:19 PM
In this paper we showed that we can do this kind of versatile black-box estimation with only an *additive* cost of privacy, where sample-and-aggregate suffers a *multiplicative* cost. But this uses exponentially many subsamples - i.e., all subsets of size n-t instead of a partition into t parts.
The missing ingredient is a better aggregation method.
Enter the shifted inverse mechanism cse.hkust.edu.hk/~yike/Shifte...
Enter the shifted inverse mechanism cse.hkust.edu.hk/~yike/Shifte...
Beyond Local Sensitivity via Down Sensitivity
In our previous post, we discussed local sensitivity and how we can get accuracy guarantees that scale with local sensitivity, which can be much better than the global sensitivity guarantees attained ...
differentialprivacy.org
October 4, 2025 at 4:05 PM
The missing ingredient is a better aggregation method.
Enter the shifted inverse mechanism cse.hkust.edu.hk/~yike/Shifte...
Enter the shifted inverse mechanism cse.hkust.edu.hk/~yike/Shifte...
So the obvious question is *can sample-and-aggregate can be made more data-efficient?* 🤔
E.g., instead of partitioning the dataset, can we use overlapping subsamples?
Unfortunately, using standard aggregation methods, this doesn't work (because overlapping subsamples means higher sensitivity).
E.g., instead of partitioning the dataset, can we use overlapping subsamples?
Unfortunately, using standard aggregation methods, this doesn't work (because overlapping subsamples means higher sensitivity).
October 4, 2025 at 4:05 PM
So the obvious question is *can sample-and-aggregate can be made more data-efficient?* 🤔
E.g., instead of partitioning the dataset, can we use overlapping subsamples?
Unfortunately, using standard aggregation methods, this doesn't work (because overlapping subsamples means higher sensitivity).
E.g., instead of partitioning the dataset, can we use overlapping subsamples?
Unfortunately, using standard aggregation methods, this doesn't work (because overlapping subsamples means higher sensitivity).