Bruno Ferman
@brunoferman.bsky.social
Professor at Sao Paulo School of Economics - FGV
Econometrics/Applied Micro/affiliate @JPAL
MIT Econ PhD
https://sites.google.com/site/brunoferman/home
Econometrics/Applied Micro/affiliate @JPAL
MIT Econ PhD
https://sites.google.com/site/brunoferman/home
That's a quick overview!
For more details, check out the full survey 📚👇
Link: arxiv.org/abs/2504.19841
Hope you find it helpful!
Feedback welcome. 🧠✍️
For more details, check out the full survey 📚👇
Link: arxiv.org/abs/2504.19841
Hope you find it helpful!
Feedback welcome. 🧠✍️
Inference with few treated units
In many causal inference applications, only one or a few units (or clusters of units) are treated. An important challenge in such settings is that standard inference methods that rely on asymptotic th...
arxiv.org
April 29, 2025 at 2:18 PM
That's a quick overview!
For more details, check out the full survey 📚👇
Link: arxiv.org/abs/2504.19841
Hope you find it helpful!
Feedback welcome. 🧠✍️
For more details, check out the full survey 📚👇
Link: arxiv.org/abs/2504.19841
Hope you find it helpful!
Feedback welcome. 🧠✍️
15/
Applied folks: we hope this serves as a warning that standard inference may fail with few treated units + guidance on choosing alternatives.
Econometricians: we wanted to provide a state-of-the-art overview — and a call for new methods based on alternative assumptions!
Applied folks: we hope this serves as a warning that standard inference may fail with few treated units + guidance on choosing alternatives.
Econometricians: we wanted to provide a state-of-the-art overview — and a call for new methods based on alternative assumptions!
April 29, 2025 at 2:18 PM
15/
Applied folks: we hope this serves as a warning that standard inference may fail with few treated units + guidance on choosing alternatives.
Econometricians: we wanted to provide a state-of-the-art overview — and a call for new methods based on alternative assumptions!
Applied folks: we hope this serves as a warning that standard inference may fail with few treated units + guidance on choosing alternatives.
Econometricians: we wanted to provide a state-of-the-art overview — and a call for new methods based on alternative assumptions!
14/
And show some equivalences:
e.g., wild-bootstrap (with null imposed) asymptotically equivalent to sign-changes when N₁ is fixed and N₀ → ∞
⇒ theoretical justification for wild-bootstrap in these settings
And show some equivalences:
e.g., wild-bootstrap (with null imposed) asymptotically equivalent to sign-changes when N₁ is fixed and N₀ → ∞
⇒ theoretical justification for wild-bootstrap in these settings
April 29, 2025 at 2:18 PM
14/
And show some equivalences:
e.g., wild-bootstrap (with null imposed) asymptotically equivalent to sign-changes when N₁ is fixed and N₀ → ∞
⇒ theoretical justification for wild-bootstrap in these settings
And show some equivalences:
e.g., wild-bootstrap (with null imposed) asymptotically equivalent to sign-changes when N₁ is fixed and N₀ → ∞
⇒ theoretical justification for wild-bootstrap in these settings
13/
We also provide finite-N₀ improvements for some methods, such as Conley-Taber and sign-changes.
Free lunch: gains with finite N₀ & asymptotic equivalent when N₀ → ∞ (with N₁ fixed)
We also provide finite-N₀ improvements for some methods, such as Conley-Taber and sign-changes.
Free lunch: gains with finite N₀ & asymptotic equivalent when N₀ → ∞ (with N₁ fixed)
April 29, 2025 at 2:18 PM
13/
We also provide finite-N₀ improvements for some methods, such as Conley-Taber and sign-changes.
Free lunch: gains with finite N₀ & asymptotic equivalent when N₀ → ∞ (with N₁ fixed)
We also provide finite-N₀ improvements for some methods, such as Conley-Taber and sign-changes.
Free lunch: gains with finite N₀ & asymptotic equivalent when N₀ → ∞ (with N₁ fixed)
12/
What if we have >1 treated (but still few)?
More info on treated ⇒ alternatives: sign-changes, Behrens-Fisher solutions, etc
Relax some assumptions relative to previous methods (but need new ones!)
⚡Power may be an issue when N₁ is very small
Many relevant trade-offs!
What if we have >1 treated (but still few)?
More info on treated ⇒ alternatives: sign-changes, Behrens-Fisher solutions, etc
Relax some assumptions relative to previous methods (but need new ones!)
⚡Power may be an issue when N₁ is very small
Many relevant trade-offs!
April 29, 2025 at 2:18 PM
12/
What if we have >1 treated (but still few)?
More info on treated ⇒ alternatives: sign-changes, Behrens-Fisher solutions, etc
Relax some assumptions relative to previous methods (but need new ones!)
⚡Power may be an issue when N₁ is very small
Many relevant trade-offs!
What if we have >1 treated (but still few)?
More info on treated ⇒ alternatives: sign-changes, Behrens-Fisher solutions, etc
Relax some assumptions relative to previous methods (but need new ones!)
⚡Power may be an issue when N₁ is very small
Many relevant trade-offs!
11/
In these extreme cases: need to impose strong restrictions on treatment effect heterogeneity!
If interested, see discussion in Section 4.1.3 on inference on sharp nulls, inference on realized treatment effects, prediction intervals, and sensitivity analysis.
In these extreme cases: need to impose strong restrictions on treatment effect heterogeneity!
If interested, see discussion in Section 4.1.3 on inference on sharp nulls, inference on realized treatment effects, prediction intervals, and sensitivity analysis.
April 29, 2025 at 2:18 PM
11/
In these extreme cases: need to impose strong restrictions on treatment effect heterogeneity!
If interested, see discussion in Section 4.1.3 on inference on sharp nulls, inference on realized treatment effects, prediction intervals, and sensitivity analysis.
In these extreme cases: need to impose strong restrictions on treatment effect heterogeneity!
If interested, see discussion in Section 4.1.3 on inference on sharp nulls, inference on realized treatment effects, prediction intervals, and sensitivity analysis.
10/
📌Extrapolate from time series
Learn about treated error using pre-treatment residuals
⚡Flip assumptions
Need time series restrictions (stationarity) but relax assumptions on cross-section
Challenges arise when counterfactuals are estimated via high-dimensional approaches
📌Extrapolate from time series
Learn about treated error using pre-treatment residuals
⚡Flip assumptions
Need time series restrictions (stationarity) but relax assumptions on cross-section
Challenges arise when counterfactuals are estimated via high-dimensional approaches
April 29, 2025 at 2:18 PM
10/
📌Extrapolate from time series
Learn about treated error using pre-treatment residuals
⚡Flip assumptions
Need time series restrictions (stationarity) but relax assumptions on cross-section
Challenges arise when counterfactuals are estimated via high-dimensional approaches
📌Extrapolate from time series
Learn about treated error using pre-treatment residuals
⚡Flip assumptions
Need time series restrictions (stationarity) but relax assumptions on cross-section
Challenges arise when counterfactuals are estimated via high-dimensional approaches
9/
Ferman and Pinto (2019): allow for heteroskedasticity that can be estimated based on observables.
Example: when units have different variances due to variation in population sizes.
See this old Twitter thread: x.com/bruno_ferman...
Ferman and Pinto (2019): allow for heteroskedasticity that can be estimated based on observables.
Example: when units have different variances due to variation in population sizes.
See this old Twitter thread: x.com/bruno_ferman...
x.com
April 29, 2025 at 2:18 PM
9/
Ferman and Pinto (2019): allow for heteroskedasticity that can be estimated based on observables.
Example: when units have different variances due to variation in population sizes.
See this old Twitter thread: x.com/bruno_ferman...
Ferman and Pinto (2019): allow for heteroskedasticity that can be estimated based on observables.
Example: when units have different variances due to variation in population sizes.
See this old Twitter thread: x.com/bruno_ferman...
8/
📌Extrapolate (learn) from control units
Learn the distribution of the treated error using controls' residuals (à la Conley and Taber)
⚡Key assumption: Errors of treated and control units must have the same distribution (homoskedasticity)
No restriction on time series!
📌Extrapolate (learn) from control units
Learn the distribution of the treated error using controls' residuals (à la Conley and Taber)
⚡Key assumption: Errors of treated and control units must have the same distribution (homoskedasticity)
No restriction on time series!
April 29, 2025 at 2:18 PM
8/
📌Extrapolate (learn) from control units
Learn the distribution of the treated error using controls' residuals (à la Conley and Taber)
⚡Key assumption: Errors of treated and control units must have the same distribution (homoskedasticity)
No restriction on time series!
📌Extrapolate (learn) from control units
Learn the distribution of the treated error using controls' residuals (à la Conley and Taber)
⚡Key assumption: Errors of treated and control units must have the same distribution (homoskedasticity)
No restriction on time series!
7/
Survey is organized based on data availability.
📌Limit case:
One treated unit & one treated period.
Enough info from the treated to construct an estimator — but no info from the treated to learn its distribution!
⚡Solution:
We need to *extrapolate* ⇒ stronger assumptions!
Survey is organized based on data availability.
📌Limit case:
One treated unit & one treated period.
Enough info from the treated to construct an estimator — but no info from the treated to learn its distribution!
⚡Solution:
We need to *extrapolate* ⇒ stronger assumptions!
April 29, 2025 at 2:18 PM
7/
Survey is organized based on data availability.
📌Limit case:
One treated unit & one treated period.
Enough info from the treated to construct an estimator — but no info from the treated to learn its distribution!
⚡Solution:
We need to *extrapolate* ⇒ stronger assumptions!
Survey is organized based on data availability.
📌Limit case:
One treated unit & one treated period.
Enough info from the treated to construct an estimator — but no info from the treated to learn its distribution!
⚡Solution:
We need to *extrapolate* ⇒ stronger assumptions!
6/
We focus on model-based approaches, more common in metrics
📚 Nice citation from Haavelmo to justify this framework + marvel movies to help make the point 🕷️: )
We also discuss design-based approaches at the end
We focus on model-based approaches, more common in metrics
📚 Nice citation from Haavelmo to justify this framework + marvel movies to help make the point 🕷️: )
We also discuss design-based approaches at the end
April 29, 2025 at 2:18 PM
6/
We focus on model-based approaches, more common in metrics
📚 Nice citation from Haavelmo to justify this framework + marvel movies to help make the point 🕷️: )
We also discuss design-based approaches at the end
We focus on model-based approaches, more common in metrics
📚 Nice citation from Haavelmo to justify this framework + marvel movies to help make the point 🕷️: )
We also discuss design-based approaches at the end
5/
Important:
📌Problems arise when the *number* of treated units is small
✅Standard methods are usually fine with 40 or 50 treated units, even when the *share* of treated is small.
Feel free to cite our survey to justify sticking to standard methods when that's your case!😉
Important:
📌Problems arise when the *number* of treated units is small
✅Standard methods are usually fine with 40 or 50 treated units, even when the *share* of treated is small.
Feel free to cite our survey to justify sticking to standard methods when that's your case!😉
April 29, 2025 at 2:18 PM
5/
Important:
📌Problems arise when the *number* of treated units is small
✅Standard methods are usually fine with 40 or 50 treated units, even when the *share* of treated is small.
Feel free to cite our survey to justify sticking to standard methods when that's your case!😉
Important:
📌Problems arise when the *number* of treated units is small
✅Standard methods are usually fine with 40 or 50 treated units, even when the *share* of treated is small.
Feel free to cite our survey to justify sticking to standard methods when that's your case!😉
4/
Extreme case: you have only 1 treated and N₀ controls.
The true variance is σ₁² + σ₀²/N₀.
But with only one treated, you just don’t have enough info to estimate σ₁² using only the treated!
Robust SEs simply set σ̂₁² = 0! 😵💫
σ₁²: var of treated
σ₀²: var of control
Extreme case: you have only 1 treated and N₀ controls.
The true variance is σ₁² + σ₀²/N₀.
But with only one treated, you just don’t have enough info to estimate σ₁² using only the treated!
Robust SEs simply set σ̂₁² = 0! 😵💫
σ₁²: var of treated
σ₀²: var of control
April 29, 2025 at 2:18 PM
4/
Extreme case: you have only 1 treated and N₀ controls.
The true variance is σ₁² + σ₀²/N₀.
But with only one treated, you just don’t have enough info to estimate σ₁² using only the treated!
Robust SEs simply set σ̂₁² = 0! 😵💫
σ₁²: var of treated
σ₀²: var of control
Extreme case: you have only 1 treated and N₀ controls.
The true variance is σ₁² + σ₀²/N₀.
But with only one treated, you just don’t have enough info to estimate σ₁² using only the treated!
Robust SEs simply set σ̂₁² = 0! 😵💫
σ₁²: var of treated
σ₀²: var of control
3/
Example to illustrate problem: comparison of means
Robust SEs estimate the variance of treated (controls) using only treated (controls) data
✅ Great with many treated/many controls!
↪️ Allow for ≠ distributions of treated/control errors
❗ Go bad with few treated units...
Example to illustrate problem: comparison of means
Robust SEs estimate the variance of treated (controls) using only treated (controls) data
✅ Great with many treated/many controls!
↪️ Allow for ≠ distributions of treated/control errors
❗ Go bad with few treated units...
April 29, 2025 at 2:18 PM
3/
Example to illustrate problem: comparison of means
Robust SEs estimate the variance of treated (controls) using only treated (controls) data
✅ Great with many treated/many controls!
↪️ Allow for ≠ distributions of treated/control errors
❗ Go bad with few treated units...
Example to illustrate problem: comparison of means
Robust SEs estimate the variance of treated (controls) using only treated (controls) data
✅ Great with many treated/many controls!
↪️ Allow for ≠ distributions of treated/control errors
❗ Go bad with few treated units...
2/
🗣️Main message
Few treated ⇒ need to rely on stronger assumptions
Many alternatives: varying in data requirements, assumptions, etc
Choice is highly context-specific. We’ll help you navigate that!
Cover cross-section and panel data (Regression, Matching, DiD, SC, etc)
🗣️Main message
Few treated ⇒ need to rely on stronger assumptions
Many alternatives: varying in data requirements, assumptions, etc
Choice is highly context-specific. We’ll help you navigate that!
Cover cross-section and panel data (Regression, Matching, DiD, SC, etc)
April 29, 2025 at 2:18 PM
2/
🗣️Main message
Few treated ⇒ need to rely on stronger assumptions
Many alternatives: varying in data requirements, assumptions, etc
Choice is highly context-specific. We’ll help you navigate that!
Cover cross-section and panel data (Regression, Matching, DiD, SC, etc)
🗣️Main message
Few treated ⇒ need to rely on stronger assumptions
Many alternatives: varying in data requirements, assumptions, etc
Choice is highly context-specific. We’ll help you navigate that!
Cover cross-section and panel data (Regression, Matching, DiD, SC, etc)
1/
Link to paper: arxiv.org/abs/2504.19841
🚨Problem
Few treated ⇒ standard methods (e.g., robust/clustered SEs) can go wrong. Even if total N is large!
📌Example
DiD with 1 treated cluster, clustered SEs underestimate true var by a factor of N. Expect over-rejections >60%!
Link to paper: arxiv.org/abs/2504.19841
🚨Problem
Few treated ⇒ standard methods (e.g., robust/clustered SEs) can go wrong. Even if total N is large!
📌Example
DiD with 1 treated cluster, clustered SEs underestimate true var by a factor of N. Expect over-rejections >60%!
Inference with few treated units
In many causal inference applications, only one or a few units (or clusters of units) are treated. An important challenge in such settings is that standard inference methods that rely on asymptotic th...
arxiv.org
April 29, 2025 at 2:18 PM
1/
Link to paper: arxiv.org/abs/2504.19841
🚨Problem
Few treated ⇒ standard methods (e.g., robust/clustered SEs) can go wrong. Even if total N is large!
📌Example
DiD with 1 treated cluster, clustered SEs underestimate true var by a factor of N. Expect over-rejections >60%!
Link to paper: arxiv.org/abs/2504.19841
🚨Problem
Few treated ⇒ standard methods (e.g., robust/clustered SEs) can go wrong. Even if total N is large!
📌Example
DiD with 1 treated cluster, clustered SEs underestimate true var by a factor of N. Expect over-rejections >60%!