Lightnews — Scholar-powered news

@mrparryparry.bsky.social

60 followers 86 following 6 posts

Artist, Autist and Scientist

Terminally Ranking

Research @irglasgow.bsky.social

https://parry-parry.github.io/

Posts Replies Media Videos

mrparryparry.bsky.social

@mrparryparry.bsky.social

This work was devised at the ECIR collab-a-thon last year, and we hope to continue discussions at this year's collab-a-thon in Lucca! Read more here: arxiv.org/abs/2502.20937 #ECIR2025 #SIGIR2025

Variations in Relevance Judgments and the Shelf Life of Test Collections

The fundamental property of Cranfield-style evaluations, that system rankings are stable even when assessors disagree on individual relevance decisions, was validated on traditional test collections. ...

arxiv.org

March 3, 2025 at 10:18 AM

mrparryparry.bsky.social

@mrparryparry.bsky.social

We consider that a human represents a bound on performance under a subjective task such as determining relevance, as only a single intent is defined in each topic. We find that systems are either indistinguishable from humans or exceed humans as oracle rankers.

March 3, 2025 at 10:18 AM

mrparryparry.bsky.social

@mrparryparry.bsky.social

We then look downstream, what effect does re-annotation have on modern systems? We find that modern system comparisons are increasingly unstable on DL’19, meaning that determining the pair-wise ordering of systems when measured nDCG values are far apart remains unstable.

March 3, 2025 at 10:18 AM

mrparryparry.bsky.social

@mrparryparry.bsky.social

We look into causes of disagreement, finding that subtle differences in query intent, even when relevance is well defined, can lead to greater disagreement in 4-grade relevance. However, we find that it is challenging to agree on what is relevant even under a fixed narrative.

March 3, 2025 at 10:18 AM

mrparryparry.bsky.social

@mrparryparry.bsky.social

Re-annotation is commonly performed to validate how variations in relevance judgements affect our ability to discriminate between retrieval systems. We validate hypotheses on stability, but in a modern setting, there are no narratives, 4-grade relevance, and a neural pool.

March 3, 2025 at 10:18 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news