margin of error = 2 x (reported margin of error)
and how much of this error is "bias" vs "variance" ?
margin of error = 2 x (reported margin of error)
and how much of this error is "bias" vs "variance" ?
how can we get a poll's margin of error ?
let's start with MRP and some simplifying assumptions.
how can we get a poll's margin of error ?
let's start with MRP and some simplifying assumptions.
y_1 = governor vote choice
y_2 = abortion proposition vote choice
x = demographics
You want E(y_2 | county).
You have y_1, y_2, x in a survey, x in the population, and E(y_1 | county).
@wpmarble.bsky.social and Josh Clinton have ideas !
y_1 = governor vote choice
y_2 = abortion proposition vote choice
x = demographics
You want E(y_2 | county).
You have y_1, y_2, x in a survey, x in the population, and E(y_1 | county).
@wpmarble.bsky.social and Josh Clinton have ideas !
You know the population distribution for X (e.g. vote choice in 2024).
But you only have a reported X* in your survey.
Should you adjust for it ?
Later today: exploring toy examples to see.
You know the population distribution for X (e.g. vote choice in 2024).
But you only have a reported X* in your survey.
Should you adjust for it ?
Later today: exploring toy examples to see.
You have multiple outcomes, but only some have aggregate truth to shift to.
How can we calibrate our estimates of p(y_1, y_2 | X) to aggregate data about E[y_1] ?
@wpmarble.bsky.social and Josh Clinton have ideas !
You have multiple outcomes, but only some have aggregate truth to shift to.
How can we calibrate our estimates of p(y_1, y_2 | X) to aggregate data about E[y_1] ?
@wpmarble.bsky.social and Josh Clinton have ideas !
1. (human) design probabilities, e.g. P[R = 1 | stratum] in stratified sampling
2. divine probabilities, e.g. P[R = 1 | anything about a person] where responders follow laws of nature
3. device probabilities, e.g. P[R = 1 | X] modeled
1. (human) design probabilities, e.g. P[R = 1 | stratum] in stratified sampling
2. divine probabilities, e.g. P[R = 1 | anything about a person] where responders follow laws of nature
3. device probabilities, e.g. P[R = 1 | X] modeled
probability sample = known nonzero probability
epsem = equal individual probabilities
SRS = equal entire-sample probabilities
probability sample = known nonzero probability
epsem = equal individual probabilities
SRS = equal entire-sample probabilities
Preorder available here: www.cambridge.org/highereducat...
@sabinecarey.bsky.social
Preorder available here: www.cambridge.org/highereducat...
@sabinecarey.bsky.social
compare 2 surveys:
1. 100% coverage, but response probability P[R = 1 | Y] differs a lot by Y
2. Only 5% coverage, but P[R = 1 | Y] is roughly constant across Y
which would you use ? both ?
compare 2 surveys:
1. 100% coverage, but response probability P[R = 1 | Y] differs a lot by Y
2. Only 5% coverage, but P[R = 1 | Y] is roughly constant across Y
which would you use ? both ?
we’ve focused on estimating means E[Y].
but say Y are openends ("describe how you feel about the candidate") and you want to read thru a few draws from the population, not only survey responders.
what should you do ?
we’ve focused on estimating means E[Y].
but say Y are openends ("describe how you feel about the candidate") and you want to read thru a few draws from the population, not only survey responders.
what should you do ?
so far we've talked about weights and MRP for E[Y], vote choice in the population overall.
but what if you want E[Y | V = 1], vote choice in the population of voters.
what are the weights and how do you modify MRP ?
so far we've talked about weights and MRP for E[Y], vote choice in the population overall.
but what if you want E[Y | V = 1], vote choice in the population of voters.
what are the weights and how do you modify MRP ?
statmodeling.stat.columbia.edu/2025/11/04/s...
statmodeling.stat.columbia.edu/2025/11/04/s...
We are looking for a teammate with expertise in both LLM tools and statistical modeling.
Someone who clearly communicates assumptions, results, and uncertainty. With care and kindness.
We are looking for a teammate with expertise in both LLM tools and statistical modeling.
Someone who clearly communicates assumptions, results, and uncertainty. With care and kindness.
typical machine learning loss looks at one individual at a time
but for MRP, we care about aggregates
typical machine learning loss looks at one individual at a time
but for MRP, we care about aggregates
you've got a survey collected by someone else, and they gave you weights.
how can you use those weights in the MRP (Multilevel Regression and Poststratification) ?
you've got a survey collected by someone else, and they gave you weights.
how can you use those weights in the MRP (Multilevel Regression and Poststratification) ?
you've done MRP.
someone asks you for survey weights.
how to get them ?
you've done MRP.
someone asks you for survey weights.
how to get them ?
in midterms, voters tend to support the out party for balance
do polls still help predict midterms ? yes
in midterms, voters tend to support the out party for balance
do polls still help predict midterms ? yes
Basu's Bears is a lesson in:
1) using auxiliary information (pre-salmon-feasting weights)
2) how bad an unbiased estimator can be
statmodeling.stat.columbia.edu/2025/09/23/s...
Basu's Bears is a lesson in:
1) using auxiliary information (pre-salmon-feasting weights)
2) how bad an unbiased estimator can be
statmodeling.stat.columbia.edu/2025/09/23/s...
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we turned to response instrument Z because random sampling is "dead"
but does this method still rely on starting with random sampling ?
we want E[Y|X] but X can be missing
@lucystats.bsky.social @sarahlotspeich.bsky.social @glenmartin.bsky.social @maartenvsmeden.bsky.social et al. say:
random imputation should use Y
deterministic imputation shouldn't
statmodeling.stat.columbia.edu/2025/09/09/s...
we want E[Y|X] but X can be missing
@lucystats.bsky.social @sarahlotspeich.bsky.social @glenmartin.bsky.social @maartenvsmeden.bsky.social et al. say:
random imputation should use Y
deterministic imputation shouldn't
statmodeling.stat.columbia.edu/2025/09/09/s...
doi.org/10.31235/osf...
split-plot designs are analogous to cluster sampling.
blocking is analogous to stratification.
featuring an experiment by Arjun Potter and colleagues at NM-AIST !
split-plot designs are analogous to cluster sampling.
blocking is analogous to stratification.
featuring an experiment by Arjun Potter and colleagues at NM-AIST !