https://philswatton.github.io/
- If tests on individual datasets fail to reject the null (can't tell which is better on any given dataset)
- & a single test comparing accuracies across datasets rejects the null (A is better than B result across datasets)
What should I infer?
- If tests on individual datasets fail to reject the null (can't tell which is better on any given dataset)
- & a single test comparing accuracies across datasets rejects the null (A is better than B result across datasets)
What should I infer?
1) interpret the results as being inconclusive on which classifier is better
2) interpret classifier A as being better than B
Obviously there is
3) Find some extra datasets w/ larger test sets, but I'm curious how people would approach the initial problem
(3/3)
1) interpret the results as being inconclusive on which classifier is better
2) interpret classifier A as being better than B
Obviously there is
3) Find some extra datasets w/ larger test sets, but I'm curious how people would approach the initial problem
(3/3)
However, on most _individual_ points (say, 14/15), the 95% CIs on the accuracy on each dataset overlap. (2/3)
However, on most _individual_ points (say, 14/15), the 95% CIs on the accuracy on each dataset overlap. (2/3)
@mariosrichards.bsky.social
@ralphscott.bsky.social
@jack-bailey.co.uk
@heinzbrandenburg.bsky.social
@mariosrichards.bsky.social
@ralphscott.bsky.social
@jack-bailey.co.uk
@heinzbrandenburg.bsky.social
0-100 are warmth, but you can recover a liberal-conservative dimension from them (I think I have a better plot somewhere):
0-100 are warmth, but you can recover a liberal-conservative dimension from them (I think I have a better plot somewhere):