Harry Scells
hscells.bsky.social
Harry Scells
@hscells.bsky.social
Assistant Professor @unituebingen.bsky.social @health-nlp.com, information retrieval researcher

https://scells.me #IR #CS
Reposted by Harry Scells
We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use.

huggingface.co/datasets/coral-nlp/german-commons
coral-nlp/german-commons · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
October 27, 2025 at 12:45 PM
Reposted by Harry Scells
We are proud to announce that we are now indexed by @dblp! Click below for Volume 1, Number 1, 2025

https://dblp.org/db/journals/irrj/irrj1.html
dblp: IRRJ, Volume 1
Bibliographic content of IRRJ, Volume 1
dblp.org
September 9, 2025 at 12:48 PM
Reposted by Harry Scells
The organization of #ECIR2026 has started! We just had our first call with all track chairs. With the calls now finalized, online and distributed across mailing lists, we’re moving on to the rest of the conference preparation!

@ecir2026.eu
📍 Delft, 30 Mar – 2 Apr 2026
👉 ecir2026.eu
September 1, 2025 at 12:21 PM
Reposted by Harry Scells
Honored to win the ICTIR Best Paper Honorable Mention Award for "Axioms for Retrieval-Augmented Generation"!
Our new axioms are integrated with ir_axioms: github.com/webis-de/ir_...
Nice to see axiomatic IR gaining momentum.
July 18, 2025 at 2:18 PM
Reposted by Harry Scells
Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation.

 📄 webis.de/publications...
July 16, 2025 at 9:04 PM
Reposted by Harry Scells
Want to know how to make bi-encoders more than 3x faster with a new backbone encoder model? Check out our talk on the Token-Independent Text Encoder (TITE) #SIGIR2025 in the efficiency track. It pools vectors within the model to improve efficiency dl.acm.org/doi/10.1145/...
July 16, 2025 at 7:28 AM
Reposted by Harry Scells
Now @fschlatt.bsky.social presents "TITE: Token-Independent Text Encoder for Information Retrieval" at #SIGIR2025

Paper: webis.de/publications...
July 16, 2025 at 9:08 AM
Reposted by Harry Scells
Lukas Gienapp presents "The Viability of Crowdsourcing for RAG Evaluation" at #SIGIR2025

The paper is available at: webis.de/publications...
July 15, 2025 at 1:53 PM
Reposted by Harry Scells
Lucky to witness #IRRJ editor-in-chief @djoerd.idf.social.ap.brid.gy signing a copy of the first edition of @irrj.sigmoid.social.ap.brid.gy for Ian Soboroff, author of the paper “Don’t Use LLMs to Make Relevance Judgments” in the volume.

#SIGIR2025

irrj.org/article/view...
July 15, 2025 at 7:06 AM
Reposted by Harry Scells
@mrparryparry.bsky.social presenting our work on reproducing TREC DL 2019 judgements and the implications for evaluating modern ranking models on modern collections. Paper: arxiv.org/abs/2502.20937
Variations in Relevance Judgments and the Shelf Life of Test Collections
The fundamental property of Cranfield-style evaluations, that system rankings are stable even when assessors disagree on individual relevance decisions, was validated on traditional test collections. ...
arxiv.org
July 14, 2025 at 2:49 PM
Reposted by Harry Scells
#sigir2025 excellent reviewers
July 14, 2025 at 7:24 AM
Reposted by Harry Scells
Thank you Carlos for the shout-out of Lightning IR in the LSR tutorial at #SIGIR2025

If you want to fine your own LSR models, check out our framework at github.com/webis-de/lig...
July 13, 2025 at 2:42 PM
Reposted by Harry Scells
From July 13-17, 2025, @scadsai.bsky.social will join the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval in Padua, Italy. Our researchers have made the following contributions.

Learn more about #SIGIR2025:
👉 https://sigir2025.dei.unipd.it/
July 10, 2025 at 9:46 AM
Reposted by Harry Scells
Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.
June 22, 2025 at 12:33 PM
Reposted by Harry Scells
Die @unituebingen feiert Erfolg im Exzellencluster-Wettbewerb 🎉Freude und Jubel bei der Pressekonferenz zur Entscheidung der DFG am 22.05.25 mit gut 200 Gästen. Weitere Infos und Fotos gibt es online 👉 uni-tuebingen.de/universitaet... @cmfi.bsky.social @ml4science.bsky.social
May 26, 2025 at 9:55 AM
Reposted by Harry Scells
Die #Exzellenzcluster stehen fest: Heute hat die Exzellenzkommission 70 Projekte zur Förderung ausgewählt. 45 Cluster werden fortgesetzt, 25 neu eingerichtet. Die Förderung beginnt ab 1. Jan. 2026 für 7 Jahre, die Fördersumme beträgt insg. 539 Mio. €/Jahr. Die Liste: www.dfg.de/resource/blo... 1/3
May 22, 2025 at 3:05 PM
Reposted by Harry Scells
Join us for the QPP Workshop today starting at 9 AM in the Sagrestia, IMT Campus!
📢 The final schedule for the ECIR 2025 workshop on Query Performance Prediction in the era of LLMs is now live!
📅 Join us on 10th April 2025: qppworkshop.github.io
🎤 Keynote by @gdebasis.bsky.social: "The Role of Query Performance Prediction in Developing Adaptive Search and RAG Systems"
QPP++ 2025: Query Performance Prediction and its Applications in the Era of Large Language Models
QPP++ 2025: Query Performance Prediction and its Applications in the Era of Large Language Models
qppworkshop.github.io
April 10, 2025 at 6:33 AM
Reposted by Harry Scells
ESSIR 2025, the European Summer School on Information Retrieval in Wolverhampton, UK, July 7-11! Dive into cutting-edge Information Retrieval & AI, network with experts. Plus, don’t miss the interactive FDIA Symposium! 🎓

👉 2025.essir.eu #IR #AI #ECIR2025
April 9, 2025 at 10:26 AM
Reposted by Harry Scells
Short Paper: Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-ranking webis.de/publications...

Full Paper: Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders webis.de/publications...
Webis Publications
Publications by the Webis group
webis.de
April 9, 2025 at 12:37 PM
Reposted by Harry Scells
Now we have @fschlatt.bsky.social on the #ECIR2025 stage predenting the research on the Set-Encoder.

The paper is online at: webis.de/publications...
April 9, 2025 at 8:00 AM
Reposted by Harry Scells
Honored to receive the best short paper award and best paper honourable mention award at #ECIR2025. Thank you to all co-authors @maik-froebe.bsky.social, @hscells.bsky.social, Shengyao Zhuang, @bevankoopman.bsky.social, Guido Zuccon, Benno Stein, @martin-potthast.com, @matthias-hagen.bsky.social 🥳
April 9, 2025 at 12:37 PM
Reposted by Harry Scells
I was very happy to talk about corpus subsampling at #ECIR2025 today.

Please find the paper at webis.de/publications...

And lat bur not least, here are some of my favorite impressions of the first day of ECIR :)
April 7, 2025 at 10:30 PM
Reposted by Harry Scells
🧵 2/4 Key findings:
1️⃣ Humans write best? No! LLM responses are rated better than human.
2️⃣ Essay answers? No! Bullet lists are often preferred.
3️⃣ Evaluate with BLEU? No! Reference-based metrics don't align with human preferences.
4️⃣ LLMs as judges? No! Prompted models produce inconsistent labels.
April 7, 2025 at 3:34 PM