Bruno Ferman
brunoferman.bsky.social
Bruno Ferman
@brunoferman.bsky.social
Professor at Sao Paulo School of Economics - FGV

Econometrics/Applied Micro/affiliate @JPAL

MIT Econ PhD

https://sites.google.com/site/brunoferman/home
Thanks, Julian!!!
May 1, 2025 at 7:29 PM
Thanks, Arin!!!
May 1, 2025 at 7:28 PM
That's a quick overview!

For more details, check out the full survey 📚👇

Link: arxiv.org/abs/2504.19841

Hope you find it helpful!
Feedback welcome. 🧠✍️
Inference with few treated units
In many causal inference applications, only one or a few units (or clusters of units) are treated. An important challenge in such settings is that standard inference methods that rely on asymptotic th...
arxiv.org
April 29, 2025 at 2:18 PM
15/
Applied folks: we hope this serves as a warning that standard inference may fail with few treated units + guidance on choosing alternatives.
Econometricians: we wanted to provide a state-of-the-art overview — and a call for new methods based on alternative assumptions!
April 29, 2025 at 2:18 PM
14/
And show some equivalences:

e.g., wild-bootstrap (with null imposed) asymptotically equivalent to sign-changes when N₁ is fixed and N₀ → ∞

⇒ theoretical justification for wild-bootstrap in these settings
April 29, 2025 at 2:18 PM
13/
We also provide finite-N₀ improvements for some methods, such as Conley-Taber and sign-changes.

Free lunch: gains with finite N₀ & asymptotic equivalent when N₀ → ∞ (with N₁ fixed)
April 29, 2025 at 2:18 PM
12/
What if we have >1 treated (but still few)?

More info on treated ⇒ alternatives: sign-changes, Behrens-Fisher solutions, etc

Relax some assumptions relative to previous methods (but need new ones!)
⚡Power may be an issue when N₁ is very small

Many relevant trade-offs!
April 29, 2025 at 2:18 PM
11/
In these extreme cases: need to impose strong restrictions on treatment effect heterogeneity!

If interested, see discussion in Section 4.1.3 on inference on sharp nulls, inference on realized treatment effects, prediction intervals, and sensitivity analysis.
April 29, 2025 at 2:18 PM
10/
📌Extrapolate from time series

Learn about treated error using pre-treatment residuals

⚡Flip assumptions
Need time series restrictions (stationarity) but relax assumptions on cross-section

Challenges arise when counterfactuals are estimated via high-dimensional approaches
April 29, 2025 at 2:18 PM
9/
Ferman and Pinto (2019): allow for heteroskedasticity that can be estimated based on observables.

Example: when units have different variances due to variation in population sizes.

See this old Twitter thread: x.com/bruno_ferman...
x.com
April 29, 2025 at 2:18 PM
8/
📌Extrapolate (learn) from control units

Learn the distribution of the treated error using controls' residuals (à la Conley and Taber)
⚡Key assumption: Errors of treated and control units must have the same distribution (homoskedasticity)
No restriction on time series!
April 29, 2025 at 2:18 PM
7/
Survey is organized based on data availability.

📌Limit case:
One treated unit & one treated period.

Enough info from the treated to construct an estimator — but no info from the treated to learn its distribution!

⚡Solution:
We need to *extrapolate* ⇒ stronger assumptions!
April 29, 2025 at 2:18 PM
6/
We focus on model-based approaches, more common in metrics

📚 Nice citation from Haavelmo to justify this framework + marvel movies to help make the point 🕷️: )

We also discuss design-based approaches at the end
April 29, 2025 at 2:18 PM
5/
Important:
📌Problems arise when the *number* of treated units is small

✅Standard methods are usually fine with 40 or 50 treated units, even when the *share* of treated is small.

Feel free to cite our survey to justify sticking to standard methods when that's your case!😉
April 29, 2025 at 2:18 PM
4/
Extreme case: you have only 1 treated and N₀ controls.

The true variance is σ₁² + σ₀²/N₀.

But with only one treated, you just don’t have enough info to estimate σ₁² using only the treated!

Robust SEs simply set σ̂₁² = 0! 😵‍💫

σ₁²: var of treated
σ₀²: var of control
April 29, 2025 at 2:18 PM
3/
Example to illustrate problem: comparison of means

Robust SEs estimate the variance of treated (controls) using only treated (controls) data

✅ Great with many treated/many controls!
↪️ Allow for ≠ distributions of treated/control errors

❗ Go bad with few treated units...
April 29, 2025 at 2:18 PM
2/
🗣️Main message

Few treated ⇒ need to rely on stronger assumptions

Many alternatives: varying in data requirements, assumptions, etc

Choice is highly context-specific. We’ll help you navigate that!

Cover cross-section and panel data (Regression, Matching, DiD, SC, etc)
April 29, 2025 at 2:18 PM
1/
Link to paper: arxiv.org/abs/2504.19841

🚨Problem
Few treated ⇒ standard methods (e.g., robust/clustered SEs) can go wrong. Even if total N is large!

📌Example
DiD with 1 treated cluster, clustered SEs underestimate true var by a factor of N. Expect over-rejections >60%!
Inference with few treated units
In many causal inference applications, only one or a few units (or clusters of units) are treated. An important challenge in such settings is that standard inference methods that rely on asymptotic th...
arxiv.org
April 29, 2025 at 2:18 PM