Lightnews — Scholar-powered news

Thomas Steinke

@stein.ke

This is crazy, but I like it.

November 6, 2025 at 4:31 AM

Thomas Steinke

@stein.ke

Was this near San Francisco? I saw something from Menlo Park.

I'm searching bsky to see if anyone else saw what I saw.

November 4, 2025 at 2:10 AM

Thomas Steinke

@stein.ke

I totally empathize with the frustration of AI not working for a lot of the use cases where it is being touted. But I've seen it work often enough that I just don't think it's tenable to say it has no clear use.

November 3, 2025 at 9:01 PM

Thomas Steinke

@stein.ke

In each of the aforementioned use cases (except maybe the last one), I'm ultimately able to validate the output. I think such use cases are fine. But I'm not yet willing to trust AI to do things autonomously.

November 3, 2025 at 8:53 PM

Thomas Steinke

@stein.ke

To be concrete, I've personally found AI useful for proofreading my writing, writing/debugging code, editing photos, suggesting proof approaches, and language translation. It's clearly not useless. But I also see it generate nonsense half the time, so I'm aware of its limitations.

November 3, 2025 at 8:48 PM

Thomas Steinke

@stein.ke

Sorry, I don't want to pick a fight here, but the lack of nuance in the discourse around AI is really disappointing.

Yes, it's overhyped. No, it's not useless.

November 3, 2025 at 8:41 PM

Thomas Steinke

@stein.ke

This is what \nocite{} is for.

October 20, 2025 at 4:19 PM

Thomas Steinke

@stein.ke

This is the "don't bring your kids along" policy.

October 15, 2025 at 11:18 PM

Thomas Steinke

@stein.ke

IMHO, the best analog to the AI bubble is the dotcom bubble. Yes, the internet proved to be economically transformative, but there was still a bubble. Companies made a lot of money in the end, but it wasn't necessarily the ones that people expected -- e.g., see CISCO:

Plot of CISCO's stock price from 1990-2025 showing huge growth around 2000 before dropping and slowly growing again.

October 9, 2025 at 1:17 AM

Thomas Steinke

@stein.ke

The final piece of the puzzle is how do you choose the subsamples? Here's where having a relative who knows combinatorics was helpful. 😁 Basically, the subsets should form a covering design. And the minimal size of a covering design parameterizes the tradeoff.

October 4, 2025 at 4:50 PM

Thomas Steinke

@stein.ke

OK, so can we get the best of both worlds? or at least trade off between the cost of privacy in terms of accuracy/data and the number of subsamples we need to evaluate on?

That's what we address in our new paper. The answer is yes, but the tradeoff is quite steep (and we have a lower bound).

Differential Privacy Papers @dppapers.bsky.social · Oct 3

Privately Estimating Black-Box Statistics

Günter F. Steinke, Thomas Steinke

http://arxiv.org/abs/2510.00322

October 4, 2025 at 4:28 PM

Thomas Steinke

@stein.ke

In this paper we showed that we can do this kind of versatile black-box estimation with only an *additive* cost of privacy, where sample-and-aggregate suffers a *multiplicative* cost. But this uses exponentially many subsamples - i.e., all subsets of size n-t instead of a partition into t parts.

Differential Privacy Papers @dppapers.bsky.social · Mar 26

Privately Evaluating Untrusted Black-Box Functions
Ephraim Linder, Sofya Raskhodnikova, Adam Smith, Thomas Steinke
http://arxiv.org/abs/2503.19268

$Privately Evaluating Untrusted Black-Box Functions Ephraim Linder, Sofya Raskhodnikova, Adam Smith, Thomas Steinke http://arxiv.org/abs/2503.19268 We provide tools for sharing sensitive data when the data curator doesn't know in advance what questions an (untrusted) analyst might ask about the data. The analyst can specify a program that they want the curator to run on the dataset. We model the program as a black-box function $f$. We study differentially private algorithms, called privacy wrappers, that, given black-box access to a real-valued function $f$ and a sensitive dataset $x$, output an accurate approximation to $f(x)$. The dataset $x$ is modeled as a finite subset of a possibly infinite set $U$, in which each entry represents data of one individual. A privacy wrapper calls $f$ on the dataset $x$ and on some subsets of $x$ and returns either an approximation to $f(x)$ or a nonresponse symbol $\perp$. The wrapper may also use additional information (that is, parameters) provided by the analyst, but differential privacy is required for all values of these parameters. Correct setting of these parameters will ensure better accuracy of the wrapper. The bottleneck in the running time of our wrappers is the number of calls to $f$, which we refer to as queries. Our goal is to design wrappers with high accuracy and low query complexity. We introduce a novel setting, the automated sensitivity detection setting, where the analyst supplies the black-box function $f$ and the intended (finite) range of $f$. In the previously considered setting, the claimed sensitivity bound setting, the analyst supplies additional parameters that describe the sensitivity of $f$. We design privacy wrappers for both settings and show that our wrappers are nearly optimal in terms of accuracy, locality (i.e., the depth of the local neighborhood of the dataset $x$ they explore), and query complexity. In the claimed sensitivity bound setting, we provide the first accuracy guarantees that have n$

October 4, 2025 at 4:19 PM

Thomas Steinke

@stein.ke

The missing ingredient is a better aggregation method.
Enter the shifted inverse mechanism cse.hkust.edu.hk/~yike/Shifte...

Beyond Local Sensitivity via Down Sensitivity

In our previous post, we discussed local sensitivity and how we can get accuracy guarantees that scale with local sensitivity, which can be much better than the global sensitivity guarantees attained ...

differentialprivacy.org

October 4, 2025 at 4:05 PM

Thomas Steinke

@stein.ke

So the obvious question is *can sample-and-aggregate can be made more data-efficient?* 🤔
E.g., instead of partitioning the dataset, can we use overlapping subsamples?
Unfortunately, using standard aggregation methods, this doesn't work (because overlapping subsamples means higher sensitivity).

October 4, 2025 at 4:05 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news