Jon Rothbaum
@jlrothbaum.bsky.social
Economist, U.S. Census Bureau,
Returned Peace Corps Volunteer, Ecuador
(All opinions are mine).
https://jrothbaum.github.io/
Returned Peace Corps Volunteer, Ecuador
(All opinions are mine).
https://jrothbaum.github.io/
ideas.repec.org/c/boc/bocode...
(github.com/jrothbaum/st...) should be faster and easier than converting to dta
(github.com/jrothbaum/st...) should be faster and easier than converting to dta
PQ: Stata module to read, write, and manage Parquet files in
pq provides commands for working with Apache Parquet files in Stata. Parquet is a columnar storage file format designed to efficiently store and process large datasets. This package allows Stata users
ideas.repec.org
August 31, 2025 at 12:46 AM
ideas.repec.org/c/boc/bocode...
(github.com/jrothbaum/st...) should be faster and easier than converting to dta
(github.com/jrothbaum/st...) should be faster and easier than converting to dta
Disclaimers: All opinions are my own. The results shown above were approved for release under Disclosure Review Board (DRB) approval number CBDRB‑FY25‑0280.
July 14, 2025 at 12:38 PM
Disclaimers: All opinions are my own. The results shown above were approved for release under Disclosure Review Board (DRB) approval number CBDRB‑FY25‑0280.
Thanks so much to my coauthors Adam Bee, John Creamer, Josh Mitchell, Nikolas Mittag, Elizabeth Pelletier, Carl Sanders, Lawrence Schmidt, and Matt Unrath. See how NEWS affects estimates for different groups at jrothbaum.github.io/news.html and the official release at www.census.gov/data/experim...
National Experimental Well-Being Statistics Project (NEWS)
jrothbaum.github.io
July 14, 2025 at 12:38 PM
Thanks so much to my coauthors Adam Bee, John Creamer, Josh Mitchell, Nikolas Mittag, Elizabeth Pelletier, Carl Sanders, Lawrence Schmidt, and Matt Unrath. See how NEWS affects estimates for different groups at jrothbaum.github.io/news.html and the official release at www.census.gov/data/experim...
As noted above, the bias can vary a lot across groups and over time. Underreporting of UI benefits can cause bias in child poverty (their parents are likely to work and therefore collect UI in a downturn) but won’t impact elderly poverty much.
July 14, 2025 at 12:38 PM
As noted above, the bias can vary a lot across groups and over time. Underreporting of UI benefits can cause bias in child poverty (their parents are likely to work and therefore collect UI in a downturn) but won’t impact elderly poverty much.
Likewise, we see more missing income, when income shifts from well-reported sources (like wage and salary earnings) to ones with greater underreporting (like unemployment insurance, or UI) in 2020 and 2021.
July 14, 2025 at 12:38 PM
Likewise, we see more missing income, when income shifts from well-reported sources (like wage and salary earnings) to ones with greater underreporting (like unemployment insurance, or UI) in 2020 and 2021.
We said Pandemic nonresponse bias started affecting the data in 2020, but how do we know this? We can look at the results after each step. Our weighting adjustment only starts affecting our estimates for surveys conducted in 2020, affecting income estimates from 2019 forward.
July 14, 2025 at 12:38 PM
We said Pandemic nonresponse bias started affecting the data in 2020, but how do we know this? We can look at the results after each step. Our weighting adjustment only starts affecting our estimates for surveys conducted in 2020, affecting income estimates from 2019 forward.
We do several things, including 1) weighting to adjust for nonresponse bias (income may be correlated with survey response), 2) imputation (not everyone answers income questions on surveys), 3) and combining survey and adrec data (what’s the right number when they disagree?)
July 14, 2025 at 12:38 PM
We do several things, including 1) weighting to adjust for nonresponse bias (income may be correlated with survey response), 2) imputation (not everyone answers income questions on surveys), 3) and combining survey and adrec data (what’s the right number when they disagree?)
Our post-tax income+in-kind transfer measure mirrors the resource measure used to calculate the Supplemental Poverty Measure (SPM).
The NEWS SPM rate is 1.7 to 3.5pp lower than the survey, depending on the year, with as many as 11.5 million fewer people in poverty.
The NEWS SPM rate is 1.7 to 3.5pp lower than the survey, depending on the year, with as many as 11.5 million fewer people in poverty.
July 14, 2025 at 12:38 PM
Our post-tax income+in-kind transfer measure mirrors the resource measure used to calculate the Supplemental Poverty Measure (SPM).
The NEWS SPM rate is 1.7 to 3.5pp lower than the survey, depending on the year, with as many as 11.5 million fewer people in poverty.
The NEWS SPM rate is 1.7 to 3.5pp lower than the survey, depending on the year, with as many as 11.5 million fewer people in poverty.
Split the effect by age, and we see the biggest change is among seniors, who tend to underreport other sources of retirement income (from 2018).
July 14, 2025 at 12:38 PM
Split the effect by age, and we see the biggest change is among seniors, who tend to underreport other sources of retirement income (from 2018).
We estimate three income measures: money income, post-tax income, and post-tax income+in-kind transfers (excluding health insurance).
Relative to the survey, our estimates of all three measures increase across the income distribution (shown from 2018)
Relative to the survey, our estimates of all three measures increase across the income distribution (shown from 2018)
July 14, 2025 at 12:38 PM
We estimate three income measures: money income, post-tax income, and post-tax income+in-kind transfers (excluding health insurance).
Relative to the survey, our estimates of all three measures increase across the income distribution (shown from 2018)
Relative to the survey, our estimates of all three measures increase across the income distribution (shown from 2018)
In the prior release, we expanded the resource measures we estimate to include taxes, credits, and in-kind benefits. We use linked adrecs to address survey underreporting of multiple safety net programs and linked tax returns to improve estimates of taxes and filing behavior.
July 14, 2025 at 12:38 PM
In the prior release, we expanded the resource measures we estimate to include taxes, credits, and in-kind benefits. We use linked adrecs to address survey underreporting of multiple safety net programs and linked tax returns to improve estimates of taxes and filing behavior.
Beyond the tldr;! We use CPS ASEC (source of official income and poverty), 1040s, W2s, info tax returns, LEHD, ACS, census, OASDI and SSI payments, federal and state safety net data (housing assistance, SNAP, TANF, and WIC), firm data, and commercial data on home values.
July 14, 2025 at 12:38 PM
Beyond the tldr;! We use CPS ASEC (source of official income and poverty), 1040s, W2s, info tax returns, LEHD, ACS, census, OASDI and SSI payments, federal and state safety net data (housing assistance, SNAP, TANF, and WIC), firm data, and commercial data on home values.
This is the third release of the National Experimental Wellbeing Statistics (NEWS) Project at Census. In this release, we use the same methods as the prior one, but cover additional years.
Latest release here: www.census.gov/data/experim...
Prior release here: bsky.app/profile/jlro...
Latest release here: www.census.gov/data/experim...
Prior release here: bsky.app/profile/jlro...
July 14, 2025 at 12:38 PM
This is the third release of the National Experimental Wellbeing Statistics (NEWS) Project at Census. In this release, we use the same methods as the prior one, but cover additional years.
Latest release here: www.census.gov/data/experim...
Prior release here: bsky.app/profile/jlro...
Latest release here: www.census.gov/data/experim...
Prior release here: bsky.app/profile/jlro...
Lots of estimates by group and year, some examples:
• Pre-tax income: jrothbaum.github.io/news/income/...
• + Taxes and credits: jrothbaum.github.io/news/income/...
• Official poverty: jrothbaum.github.io/news/poverty...
• Supplemental poverty: jrothbaum.github.io/news/poverty...
• Pre-tax income: jrothbaum.github.io/news/income/...
• + Taxes and credits: jrothbaum.github.io/news/income/...
• Official poverty: jrothbaum.github.io/news/poverty...
• Supplemental poverty: jrothbaum.github.io/news/poverty...
NEWS - Money Income - By Year
jrothbaum.github.io
July 14, 2025 at 12:38 PM
Lots of estimates by group and year, some examples:
• Pre-tax income: jrothbaum.github.io/news/income/...
• + Taxes and credits: jrothbaum.github.io/news/income/...
• Official poverty: jrothbaum.github.io/news/poverty...
• Supplemental poverty: jrothbaum.github.io/news/poverty...
• Pre-tax income: jrothbaum.github.io/news/income/...
• + Taxes and credits: jrothbaum.github.io/news/income/...
• Official poverty: jrothbaum.github.io/news/poverty...
• Supplemental poverty: jrothbaum.github.io/news/poverty...
This varies by group. Parents have mostly wage and salary earnings, which is well reported: not much normal bias, but lots of underreporting of UI in 2020. Those 65+ have lots of retirement income: lots of normal bias, but not much change in 2020 or 2021.
July 14, 2025 at 12:38 PM
This varies by group. Parents have mostly wage and salary earnings, which is well reported: not much normal bias, but lots of underreporting of UI in 2020. Those 65+ have lots of retirement income: lots of normal bias, but not much change in 2020 or 2021.
But the bias varies by year 1) in each year there’s “normal” bias from underreporting of income, like pensions, 2) From 2020 on, high income households respond at higher rates, 3) some income is reported better than others, and UI is not well reported, so UI ↑ in 2020 => bias ↑
July 14, 2025 at 12:38 PM
But the bias varies by year 1) in each year there’s “normal” bias from underreporting of income, like pensions, 2) From 2020 on, high income households respond at higher rates, 3) some income is reported better than others, and UI is not well reported, so UI ↑ in 2020 => bias ↑
The difference between NEWS and official survey estimates can be large. In 2020, the NEWS estimate of official poverty is 2.4pp lower than the survey, with 8 million fewer people in poverty.
July 14, 2025 at 12:38 PM
The difference between NEWS and official survey estimates can be large. In 2020, the NEWS estimate of official poverty is 2.4pp lower than the survey, with 8 million fewer people in poverty.
In my testing it's 2-5x faster than my best attempt to do this using Stata's python integration (and my python-based solution is way faster than the standard code for large files with batched file handling and multiprocessing).
May 25, 2025 at 2:04 PM
In my testing it's 2-5x faster than my best attempt to do this using Stata's python integration (and my python-based solution is way faster than the standard code for large files with batched file handling and multiprocessing).
I've done benchmarks (see github.com/jrothbaum/st...). Stata is efficient at loading dta files and I can't match that, but parquet files are faster to load if the file is large and you only need a subset of columns - that's where parquet shines
GitHub - jrothbaum/stata_parquet_io: Read and write parquet files to stata using polars rust
Read and write parquet files to stata using polars rust - jrothbaum/stata_parquet_io
github.com
May 25, 2025 at 1:53 PM
I've done benchmarks (see github.com/jrothbaum/st...). Stata is efficient at loading dta files and I can't match that, but parquet files are faster to load if the file is large and you only need a subset of columns - that's where parquet shines
Parquet is a standard file format with some big advantages: standard format for use in R and python (for multilanguage projects), super compressed relative to dta files (5-10% the size on disk). www.databricks.com/glossary/wha...
What is Apache Parquet?
Learn more about the open source file format Apache Parquet, its applications in data science, and its advantages over CSV and TSV formats.
www.databricks.com
May 25, 2025 at 1:53 PM
Parquet is a standard file format with some big advantages: standard format for use in R and python (for multilanguage projects), super compressed relative to dta files (5-10% the size on disk). www.databricks.com/glossary/wha...
Reposting the image in a different format...
January 29, 2025 at 3:28 PM
Reposting the image in a different format...
2) estimating income and poverty when many or all of the adrecs are not yet available, both for timely estimates (some adrecs arrive with months or years of lag) and going back in time, and 3) estimating income and poverty at lower geographic levels (state, county, tract). (n/n)
January 29, 2025 at 3:22 PM
2) estimating income and poverty when many or all of the adrecs are not yet available, both for timely estimates (some adrecs arrive with months or years of lag) and going back in time, and 3) estimating income and poverty at lower geographic levels (state, county, tract). (n/n)