Martin Potthast
banner
martin-potthast.com
Martin Potthast
@martin-potthast.com
Professor at the University of Kassel, https://hessian.AI, and https://ScaDS.AI. Member of @webis.de
Research in information retrieval #IR, natural language processing #NLP, and artificial intelligence.
Reposted by Martin Potthast
We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use.

huggingface.co/datasets/coral-nlp/german-commons
coral-nlp/german-commons · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
October 27, 2025 at 12:45 PM
Reposted by Martin Potthast
🌟Really excited to share the fourth Strategic Workshop on Information Retrieval (SWIRL) report published in SIGIR Forum!

Paper 👉🏻 www.johannetrippas.com/papers/tripp...

More info 👉🏻 sites.google.com/view/swirl20...

#SWIRL2025 #SIGIR2026 #IR #GenAI #Research #CHIIR2026
September 2, 2025 at 12:38 PM
Reposted by Martin Potthast
Thrilled to announce that Matti Wiegmann has successfully defended his PhD! 🎉🧑‍🎓 Huge congratulations on this incredible achievement! #PhDDefense #AcademicMilestone
July 18, 2025 at 11:44 AM
Reposted by Martin Potthast
Honored to win the ICTIR Best Paper Honorable Mention Award for "Axioms for Retrieval-Augmented Generation"!
Our new axioms are integrated with ir_axioms: github.com/webis-de/ir_...
Nice to see axiomatic IR gaining momentum.
July 18, 2025 at 2:18 PM
Reposted by Martin Potthast
We presented two papers at ICTIR 2025 today:
- Axioms for Retrieval-Augmented Generation webis.de/publications...
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins webis.de/publications...
July 18, 2025 at 2:18 PM
Reposted by Martin Potthast
Want to know how to make bi-encoders more than 3x faster with a new backbone encoder model? Check out our talk on the Token-Independent Text Encoder (TITE) #SIGIR2025 in the efficiency track. It pools vectors within the model to improve efficiency dl.acm.org/doi/10.1145/...
July 16, 2025 at 7:28 AM
Reposted by Martin Potthast
Now @fschlatt.bsky.social presents "TITE: Token-Independent Text Encoder for Information Retrieval" at #SIGIR2025

Paper: webis.de/publications...
July 16, 2025 at 9:08 AM
Reposted by Martin Potthast
Here are some impressions from our ReNeuIR workshop on "Reaching Efficiency in Neural IR" that we had yesterday at #SIGIR2025.
July 18, 2025 at 8:41 AM
Reposted by Martin Potthast
Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation.

 📄 webis.de/publications...
July 16, 2025 at 9:04 PM
Reposted by Martin Potthast
Lukas Gienapp presents "The Viability of Crowdsourcing for RAG Evaluation" at #SIGIR2025

The paper is available at: webis.de/publications...
July 15, 2025 at 1:53 PM
Reposted by Martin Potthast
@mrparryparry.bsky.social presenting our work on reproducing TREC DL 2019 judgements and the implications for evaluating modern ranking models on modern collections. Paper: arxiv.org/abs/2502.20937
Variations in Relevance Judgments and the Shelf Life of Test Collections
The fundamental property of Cranfield-style evaluations, that system rankings are stable even when assessors disagree on individual relevance decisions, was validated on traditional test collections. ...
arxiv.org
July 14, 2025 at 2:49 PM
Reposted by Martin Potthast
Thank you Carlos for the shout-out of Lightning IR in the LSR tutorial at #SIGIR2025

If you want to fine your own LSR models, check out our framework at github.com/webis-de/lig...
July 13, 2025 at 2:42 PM
Reposted by Martin Potthast
From July 13-17, 2025, @scadsai.bsky.social will join the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval in Padua, Italy. Our researchers have made the following contributions.

Learn more about #SIGIR2025:
👉 https://sigir2025.dei.unipd.it/
July 10, 2025 at 9:46 AM
Reposted by Martin Potthast
Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :)

The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.

More details are available at: trec-tot.github.io/guidelines
June 27, 2025 at 2:46 PM
Reposted by Martin Potthast
Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.
June 22, 2025 at 12:33 PM
Reposted by Martin Potthast
Most reporting on AI examines worst-case systems deployed under the guise of efficiency. But what would a good faith effort at Ethical AI look like? For two years, we’ve been looking over the shoulder of a city trying to do things differently.
June 11, 2025 at 1:39 PM
Reposted by Martin Potthast
All @acm.org publications will be 100% Open Access as of January 2026. When we announced this at POPL and CHI this year, conference participants spontaneously erupted in applause. The CS community is excited about ACM's move to OA!
May 19, 2025 at 5:50 PM
Reposted by Martin Potthast
The deadline for submissions to the ReNeuIR workshop at #SIGIR2025 is extended to June 10 😸

Details: reneuir.org

#ReNeuIr2025 #SIGIR25
ReNeuIR’25
Workshop on Reaching Efficiency in Neural Information Retrieval
reneuir.org
May 21, 2025 at 5:31 PM
Reposted by Martin Potthast
PAN 2025 Call for Participation: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality

We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.

Find out more at pan.webis.de/clef25/pan25...
pan.webis.de
March 5, 2025 at 1:14 PM
Reposted by Martin Potthast
We share your concern that LLMs could be prompted to generate responses that are biased in favor of certain products. That is why we are currently organizing a shared task on detecting advertisements in the responses of RAG-based search engines: bsky.app/profile/webi...
Can LLM-generated ads be blocked? With OpenAI adding shopping options to ChatGPT, this question gains further importance.
If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch...

More details below.
April 30, 2025 at 12:52 PM
The fourth edition of ReNeuIR @ #SIGIR2025 is back!! Check reneuir.org to see what we have in mind this year! Paper submission deadline: May 20, 2025.
April 30, 2025 at 12:16 PM
Reposted by Martin Potthast
Can LLM-generated ads be blocked? With OpenAI adding shopping options to ChatGPT, this question gains further importance.
If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch...

More details below.
April 30, 2025 at 11:17 AM
Reposted by Martin Potthast
New AI ethics scandal brewing... turns out a team at University of Zurich had dozens of undisclosed AI bot accounts debating with people on /r/ChangeMyView from November 2024 to March 2025 simonwillison.net/2025/Apr/26/...
META: Unauthorized Experiment on CMV Involving AI-generated Comments
[r/changemyview](https://www.reddit.com/r/changemyview/) is a popular (top 1%) well moderated subreddit with an extremely well developed [set of rules](https://www.reddit.com/r/changemyview/wiki/rules...
simonwillison.net
April 26, 2025 at 10:42 PM
Reposted by Martin Potthast
📢 The Internet Archive needs your help.

At a time when information is being rewritten or erased online, a $700 million lawsuit from major record labels threatens to destroy the Wayback Machine.

Tell the labels to drop the 78s lawsuit.

👉 Sign our open letter: www.change.org/p/defend-the...

🧵⬇️
April 17, 2025 at 4:51 PM
Reposted by Martin Potthast
The Workshop on Open Web Search at #ECIR2025 just starts with a keynote by @claclarke.bsky.social on Annotative Indexing. #WOWS25 #WOWS2025 #ECIR25
April 10, 2025 at 7:16 AM