More details coming soon. See you all in Tokyo next year!
More details coming soon. See you all in Tokyo next year!
All run submissions for the Tip-of-the-Tongue (ToT) Track are due next week Wednesday (Aug 27).
More info: trec-tot.github.io/guidelines
#TREC2025 #TRECToT #TREC2025ToT
All run submissions for the Tip-of-the-Tongue (ToT) Track are due next week Wednesday (Aug 27).
More info: trec-tot.github.io/guidelines
#TREC2025 #TRECToT #TREC2025ToT
We provide codes for baseline systems, and submissions are due by August 27th!
More information: trec-tot.github.io/guidelines
#TREC2025 #TRECToT #TREC2025ToT
Spread the word!
We provide codes for baseline systems, and submissions are due by August 27th!
The paper is available online: dl.acm.org/doi/10.1145/...
The paper is available online: dl.acm.org/doi/10.1145/...
We have released the test queries for the TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track. Please see the guidelines for more information: trec-tot.github.io/guidelines. Run submission deadline will tentatively be in August. #TREC2025 #TRECToT #TREC2025ToT
Please spread the word!
We have released the test queries for the TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track. Please see the guidelines for more information: trec-tot.github.io/guidelines. Run submission deadline will tentatively be in August. #TREC2025 #TRECToT #TREC2025ToT
Please spread the word!
🤩 See how fair ranking boosts downstream utility while promoting fairer attribution of cited sources.
Catch our oral presentation at #ICTIR2025!
#SIGIR2025 @841io.bsky.social
Paper: arxiv.org/abs/2409.11598
🤩 See how fair ranking boosts downstream utility while promoting fairer attribution of cited sources.
Catch our oral presentation at #ICTIR2025!
#SIGIR2025 @841io.bsky.social
The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.
More details are available at: trec-tot.github.io/guidelines
The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.
More details are available at: trec-tot.github.io/guidelines
🚩 Tired of “cultural” evals that don't consult people?
We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗
📜 arxiv.org/abs/2506.00784
[1/11]
🚩 Tired of “cultural” evals that don't consult people?
We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗
📜 arxiv.org/abs/2506.00784
[1/11]
Excited to announce the release of TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track guidelines: trec-tot.github.io/guidelines. We will release test queries in July and run submission deadline will be in August. #TREC2025 #TRECToT #TREC2025ToT
Please register to participate:
Excited to announce the release of TREC 2025 Tip-of-the-Tongue (TREC-ToT) Track guidelines: trec-tot.github.io/guidelines. We will release test queries in July and run submission deadline will be in August. #TREC2025 #TRECToT #TREC2025ToT
Please register to participate:
In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.
📄 aclanthology.org/2025.finding...
#NLP #Evaluation
(🧵1/9)
In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.
📄 aclanthology.org/2025.finding...
#NLP #Evaluation
(🧵1/9)
arxiv.org/abs/2409.11598
arxiv.org/abs/2409.11598
dl.acm.org/doi/10.1145/...
dl.acm.org/doi/10.1145/...
F Diaz, M Ekstrand (@md.ekstrandom.net), B Mitra (@bmitra.bsky.social)
For IR, NLP, and ML researchers working on ranking systems evaluated for recall and robustness. 🧵 1/5 dl.acm.org/doi/10.1145/...
F Diaz, M Ekstrand (@md.ekstrandom.net), B Mitra (@bmitra.bsky.social)
For IR, NLP, and ML researchers working on ranking systems evaluated for recall and robustness. 🧵 1/5 dl.acm.org/doi/10.1145/...
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
🤞means luck in US but deeply offensive in Vietnam 🚨
📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!
📜: arxiv.org/abs/2502.17710
🤞means luck in US but deeply offensive in Vietnam 🚨
📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!
📜: arxiv.org/abs/2502.17710
Paper: arxiv.org/abs/2409.11598
Paper: arxiv.org/abs/2409.11598
The full deck is here. There's a lot of gems if you're interested in this space!
retrieval-enhanced-ml.github.io/sigir-ap2024...
The full deck is here. There's a lot of gems if you're interested in this space!
retrieval-enhanced-ml.github.io/sigir-ap2024...
In collaboration w/ the amazing @841io.bsky.social @teknology.bsky.social Alireza Salemi and Hamed Zamani.
I can’t seem to find everyone though, help definitely appreciated to fill this out (DM or comment)!
I can’t seem to find everyone though, help definitely appreciated to fill this out (DM or comment)!
It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR.
We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?
It's time to revisit common assumptions in IR! Embeddings have improved drastically, but mainstream IR evals have stagnated since MSMARCO + BEIR.
We ask: on private or tricky IR tasks, are rerankers better? Surely, reranking many docs is best?
go.bsky.app/JgneRQk
go.bsky.app/JgneRQk