Lightnews — Scholar-powered news

Jack Fitzgerald

@jackfitzgerald.bsky.social

Had a great time presenting my job market paper at the Lindau Nobel Meeting in Economic Sciences! 🔗 : osf.io/d7sqr_v1/

#LINOecon #EconSky

September 2, 2025 at 7:05 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

What academic journal should I start? Wrong answers only

March 13, 2025 at 4:45 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

I'll be waking up early (7 AM CET) on Tuesday, March 12 to present my job market paper at 5 PM Sydney time! If you're awake too, stop by to hear me talk about equivalence testing, replication-based methods research, and the robustness of null results in economics!

Paper title: The Need for Equivalence Testing in Economics. Written by Jack Fitzgerald, Vrije Universiteit Amsterdam and Tinbergen Institute. Date: February 5, 2025. Abstract: Equivalence testing can provide statistically significant evidence that economic relationships are practically negligible. I demonstrate its necessity in a large-scale reanalysis of estimates defending 135 null claims made in 81 recent articles from top economics journals. 36-63% of estimates defending the average null claim fail lenient equivalence tests. In a prediction platform survey, researchers accurately predict that equivalence testing failure rates will significantly exceed levels which they deem acceptable. Obtaining equivalence testing failure rates that these researchers deem acceptable requires arguing that nearly 75% of published estimates in economics are practically equal to zero. These results imply that Type II error rates are unacceptably high throughout economics, and that many null findings in economics reflect low power rather than truly negligible relationships. I provide economists with guidelines and commands in Stata and R for conducting credible equivalence testing and practical significance testing in future research.

March 7, 2025 at 10:53 AM

Jack Fitzgerald

@jackfitzgerald.bsky.social

We also offer the tst() command in the eqtesting R package, the tsti command in Stata, and Jamovi code. You can visit the paper to find download instructions for all, + guidelines for implementation. We hope you find it useful! (8/9)

osf.io/preprints/ps...

Abstract of the paper; can be accessed at the URL in the post.

December 20, 2024 at 3:58 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

To make things easy, we offer the ShinyTST app, a point-and-click Shiny app that tells you which test/confidence interval is relevant, provides p-values, and visualizes test results given an estimate, standard error, and SESOI. 7/9

jack-fitzgerald.shinyapps.io/shinyTST/

An estimate (18000), standard error (3000), smallest effect size of interest (10000), and significance level (0.05) have been supplied to the left-hand panel. A notice at the top indicates that results are asymptotically approximate. Blue text indicates the 90% confidence interval. Red text indicates the 95% confidence interval. Red text denotes that the superiority test is the relevant test (because the estimate is above the upper delta bound). Black text provides the test z-statistic, p-value, and practical significance conclusion. A graph below plots the estimate in black, a 90% confidence interval in blue, and a 95% confidence interval in red, all of which are above the upper delta bound of 10000.

December 20, 2024 at 3:57 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Practical significance conclusions about an estimate can be easily inferred from double-banded confidence intervals that combine the estimate’s (1 - α) CI (e.g., its 95% CI) with its (1 - 2α) CI (e.g., its 90% CI). 6/9

In the inferiority region, a 95% confidence interval is used to assess whether the estimate is significantly bounded beneath the lower delta bound. In the inferiority region, a 90% confidence interval is used to assess whether the estimate is significantly bounded within the delta bounds, even if the 95% confidence interval crosses one of the delta bounds. In the superiority region, a 95% confidence interval is used to assess whether the estimate is significantly bounded above the upper delta bound. The bottom four estimates have confidence intervals crossing the delta bounds, implying that results are inconclusive.

December 20, 2024 at 3:54 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

The three-sided testing (TST) framework combines two-sided minimum effects tests for inferiority/superiority with the two one-sided tests (TOST) equivalence testing procedure. TST can provide stat. sig. evidence that estimates are practically significant, or practically = 0. 4/9

Panel A shows an inferiority test, where H0 states that the estimate is greater than the lower delta bound and HA states that the estimate is less the lower delta bound. Panel B shows a TOST procedure, where the null hypothesis is that the estimate is either above the upper delta bound or below the lower delta bound, and the alternative hypothesis is that the estimate is between the delta bounds. Panel C shows a superiority test, where the null hypothesis states that the estimate is less than the upper delta bound, and the alternative hypothesis states that the estimate is above the upper delta bound. Panel D shows a three-sided tests, where the inferiority region is bounded below the lower delta bound, the equivalence region is bounded between the delta bounds, and the superiority region is bounded above the upper delta bound.

December 20, 2024 at 3:53 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Estimates can also be stat. sig. bounded outside of Δ (e.g., blue estimate). What should we conclude about estimates like these blue/orange estimates? Standard equivalence testing frameworks don't give us clear answers. We introduce researchers to a framework that does. 3/9

Pink estimate is statistically significantly bounded between the delta bounds. Blue estimate is statistically significantly bounded above the upper delta bound. The confidence interval of the orange estimate intersects one of the delta bound, but does not cross zero.

December 20, 2024 at 3:52 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Equivalence testing lets us test whether estimates are stat. sig. bounded beneath practically negligible effect size Δ (e.g., pink estimate). But estimates can be both stat. sig. diff. from zero and stat. sig. bounded beneath Δ. 2/9

December 20, 2024 at 3:51 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

New paper for holiday reading! @isager.bsky.social and I provide an introduction to three-sided testing, a framework for testing estimates' practical significance. We offer a tutorial, Shiny app, + commands/code in #Rstats, #Jamovi, + #Stata. 1/9

osf.io/preprints/psyarxiv/8y925
#EconSky #PsychSky

December 20, 2024 at 3:50 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

In more recent news, I thoroughly enjoyed presenting The Need for Equivalence Testing in Economics at the Netherlands Reproducibility Network Symposium and Platform for Young Meta-Scientists Symposium, with great discussion from Tsz Keung Wong!

December 9, 2024 at 8:58 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Two weeks ago, I had a wonderful time presenting The Need for Equivalence Testing in Economics at the Leibniz Open Science Day! (pic: @prashantgarg.bsky.social)

December 9, 2024 at 8:55 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Does this count

December 2, 2024 at 9:48 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

I had an excellent time presenting this paper to the Behavioural Insights for Business and Policy Network at the University of New South Wales. A huge thanks to @impartialspectator.bsky.social for hosting!

November 19, 2024 at 5:37 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

I’ve also spent an extensive amount of time yelling about how stat. insig. bias estimates are really bad evidence that there’s negligible/zero bias. For an in-depth discussion, see my job market paper. 17/19
🔗: jack-fitzgerald.github.io/files/The_Ne...

November 18, 2024 at 4:08 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

TE-irrelevant biases can badly misidentify TE-relevant biases, even up to the point of complete sign-flips. Trying to learn about hypothetical biases on TEs from hypothetical bias experiments that only vary stakes conditions can yield very misleading conclusions. 15/19

November 18, 2024 at 4:05 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Here’s a simulated example where hypothetical stakes increase the outcome’s standard deviation, but decrease the TE’s standard error. Just because your outcome is more precisely measured doesn’t necessarily mean your TE will be! 13/19

November 18, 2024 at 4:04 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

This means that you can’t identify IHB in an experiment where you just randomize stakes conditions between groups and take differences in mean outcomes between those groups. If you try to infer IHBs from the CHBs estimated in these experiments, you can be badly misled. 11/19

November 18, 2024 at 4:02 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Intuitively, that’s for two reasons. 1) You can’t identify an interaction effect if all you know is the avg marginal effect of one of the variables in the interaction. 2) You shouldn’t expect hypothetical stakes to impact all interventions’ TEs on an outcome in the exact same way. 10/19

November 18, 2024 at 4:02 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

I term the hypothetical bias relevant for TEs ‘interactive hypothetical bias (IHB)’, because it reflects the interaction effect between hypothetical stakes and the intervention of interest. CHB doesn’t identify this bias. 9/19

November 18, 2024 at 4:01 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

In elicitation experiments, the only hypothetical bias we care about is the average marginal effect of hypothetical stakes on the outcome. I call this ‘classical hypothetical bias (CHB)’ because it’s the bias identified in most prior hypothetical bias studies. 7/19

November 18, 2024 at 4:00 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

‘Intervention experiments’ do vary an intervention whose TE we care about. E.g., the same Becker-DeGroot-Marschak experiment w/ one product feature randomized between halves of the sample can give us estimates of that product feature’s impact on willingness to pay. 6/19

November 18, 2024 at 4:00 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

‘Elicitation experiments’ vary no intervention: they just use experimental procedures to elicit (descriptive stats on) outcomes. E.g., Becker-DeGroot-Marschak experiments can elicit (average) willingness to pay for a product, but vary no intervention whose treatment effect (TE) we care about. 5/19

November 18, 2024 at 3:59 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

There’s also recently been a wave of new studies showing that certain outcomes don’t stat. sig. differ between real-stakes and hypothetical-stakes experiments. These results are affecting thinking at the highest levels of experimental economics. 3/19

November 18, 2024 at 3:58 PM

Jack Fitzgerald

@jackfitzgerald.bsky.social

Do real stakes/incentives matter in experiments? Recent studies say they don’t. My new paper shows that these studies’ results — and those of most hypothetical bias experiments — are uninformative when we care about treatment effects. 1/19
#EconSky #PsychSky #PoliSky
🔗: papers.tinbergen.nl/24070.pdf

November 18, 2024 at 3:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news