Webis Group
@webis.de
Information is nothing without retrieval
The Webis Group contributes to information retrieval, natural language processing, machine learning, and symbolic AI.
The Webis Group contributes to information retrieval, natural language processing, machine learning, and symbolic AI.
We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use.
huggingface.co/datasets/coral-nlp/german-commons
huggingface.co/datasets/coral-nlp/german-commons
coral-nlp/german-commons · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
October 27, 2025 at 12:45 PM
We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use.
huggingface.co/datasets/coral-nlp/german-commons
huggingface.co/datasets/coral-nlp/german-commons
We presented two papers at ICTIR 2025 today:
- Axioms for Retrieval-Augmented Generation webis.de/publications...
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins webis.de/publications...
- Axioms for Retrieval-Augmented Generation webis.de/publications...
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins webis.de/publications...
July 18, 2025 at 2:18 PM
We presented two papers at ICTIR 2025 today:
- Axioms for Retrieval-Augmented Generation webis.de/publications...
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins webis.de/publications...
- Axioms for Retrieval-Augmented Generation webis.de/publications...
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins webis.de/publications...
Thrilled to announce that Matti Wiegmann has successfully defended his PhD! 🎉🧑🎓 Huge congratulations on this incredible achievement! #PhDDefense #AcademicMilestone
July 18, 2025 at 11:44 AM
Thrilled to announce that Matti Wiegmann has successfully defended his PhD! 🎉🧑🎓 Huge congratulations on this incredible achievement! #PhDDefense #AcademicMilestone
Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation.
📄 webis.de/publications...
📄 webis.de/publications...
July 16, 2025 at 9:04 PM
Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation.
📄 webis.de/publications...
📄 webis.de/publications...
Reposted by Webis Group
Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :)
The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.
More details are available at: trec-tot.github.io/guidelines
The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.
More details are available at: trec-tot.github.io/guidelines
June 27, 2025 at 2:46 PM
Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :)
The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.
More details are available at: trec-tot.github.io/guidelines
The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.
More details are available at: trec-tot.github.io/guidelines
Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.
June 22, 2025 at 12:33 PM
Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.
Our paper titled “The Two Paradigms of LLM Detection: Authorship Attribution vs. Authorship Verification” has been accepted to #ACL2025 (Findings). downloads.webis.de/publications...
We discuss why LLM detection is a one-class problem and how that affects the prospective… 1/3 #ACL #NLP #ARR #LLM
We discuss why LLM detection is a one-class problem and how that affects the prospective… 1/3 #ACL #NLP #ARR #LLM
June 2, 2025 at 7:38 AM
Reposted by Webis Group
PAN 2025 Call for Participation: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
pan.webis.de
March 5, 2025 at 1:14 PM
PAN 2025 Call for Participation: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
Can LLM-generated ads be blocked? With OpenAI adding shopping options to ChatGPT, this question gains further importance.
If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch...
More details below.
If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch...
More details below.
April 30, 2025 at 11:17 AM
Can LLM-generated ads be blocked? With OpenAI adding shopping options to ChatGPT, this question gains further importance.
If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch...
More details below.
If you are interested in contributing to the research on LLM-based advertising, please check out our shared task: touche.webis.de/clef25/touch...
More details below.
📢 Our paper "The Viability of Crowdsourcing for RAG Evaluation" has been accepted to #SIGIR2025 !
We compared how good humans and LLMs are at writing and judging RAG responses, assembling 1800+ responses across 3 styles, and 47K+ pairwise judgments in 7 quality dimensions. 🧵➡️
We compared how good humans and LLMs are at writing and judging RAG responses, assembling 1800+ responses across 3 styles, and 47K+ pairwise judgments in 7 quality dimensions. 🧵➡️
April 7, 2025 at 3:34 PM
📢 Our paper "The Viability of Crowdsourcing for RAG Evaluation" has been accepted to #SIGIR2025 !
We compared how good humans and LLMs are at writing and judging RAG responses, assembling 1800+ responses across 3 styles, and 47K+ pairwise judgments in 7 quality dimensions. 🧵➡️
We compared how good humans and LLMs are at writing and judging RAG responses, assembling 1800+ responses across 3 styles, and 47K+ pairwise judgments in 7 quality dimensions. 🧵➡️
PAN 2025 Call for Participation: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
pan.webis.de
March 5, 2025 at 1:14 PM
PAN 2025 Call for Participation: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
We'd like to invite you to participate in the following shared tasks at PAN 2025 held in conjunction with the CLEF conference in Madrid, Spain.
Find out more at pan.webis.de/clef25/pan25...
Interested in joining our research group or do you know someone who might be interested?
We have a new vacancy: Research position at the Webis group on Watermarking for Large Language Models.
More information:
webis.de/for-students...
We have a new vacancy: Research position at the Webis group on Watermarking for Large Language Models.
More information:
webis.de/for-students...
February 17, 2025 at 8:55 AM
Interested in joining our research group or do you know someone who might be interested?
We have a new vacancy: Research position at the Webis group on Watermarking for Large Language Models.
More information:
webis.de/for-students...
We have a new vacancy: Research position at the Webis group on Watermarking for Large Language Models.
More information:
webis.de/for-students...
2nd International Workshop on Open Web Search: CfP
We invite you to the #ECIR2025 Workshop on Open Web Search #wows2025. Please consider to submit to the scientific track or the WOWS-Eval shared task to enrich the Open Web Index with relevance judgments.
Details: opensearchfoundation.org/wows2025
We invite you to the #ECIR2025 Workshop on Open Web Search #wows2025. Please consider to submit to the scientific track or the WOWS-Eval shared task to enrich the Open Web Index with relevance judgments.
Details: opensearchfoundation.org/wows2025
1st International Workshop on Open Web Search #wows2024 - 28 March 2024
Discuss ideas and approaches to open up the web search ecosystem!
opensearchfoundation.org
January 8, 2025 at 5:29 PM
2nd International Workshop on Open Web Search: CfP
We invite you to the #ECIR2025 Workshop on Open Web Search #wows2025. Please consider to submit to the scientific track or the WOWS-Eval shared task to enrich the Open Web Index with relevance judgments.
Details: opensearchfoundation.org/wows2025
We invite you to the #ECIR2025 Workshop on Open Web Search #wows2025. Please consider to submit to the scientific track or the WOWS-Eval shared task to enrich the Open Web Index with relevance judgments.
Details: opensearchfoundation.org/wows2025
Reposted by Webis Group
Time for a starter pack on information retrieval: go.bsky.app/MXPJoTn
November 14, 2024 at 8:57 PM
Time for a starter pack on information retrieval: go.bsky.app/MXPJoTn
Today we will present our poster on Query Variation Robustness of Transformer Models at #EMNLP2024. You can find us at the Information Retrieval and Text Mining 3 poster session at #EMNLP2024.
November 13, 2024 at 5:19 PM
Today we will present our poster on Query Variation Robustness of Transformer Models at #EMNLP2024. You can find us at the Information Retrieval and Text Mining 3 poster session at #EMNLP2024.
Below you can see our past tweets, just imported from “the darkened X”.
Above, we see nothing but Bluesky.
Above, we see nothing but Bluesky.
November 8, 2024 at 7:47 PM
Below you can see our past tweets, just imported from “the darkened X”.
Above, we see nothing but Bluesky.
Above, we see nothing but Bluesky.
Goodbye Washington! We had a fantastic week with interesting talks, discussions, and new ideas at #SIGIR24 #SIGIR2024. We hope to see you all again next year in Italy :) https://x.com/webis_de/status/1815115279510208625/photo/1
November 8, 2024 at 7:25 PM
Goodbye Washington! We had a fantastic week with interesting talks, discussions, and new ideas at #SIGIR24 #SIGIR2024. We hope to see you all again next year in Italy :) https://x.com/webis_de/status/1815115279510208625/photo/1
The paper can be found on our homepage (https://webis.de/publications.html#schmidt_2024) and the dataset is on Zenodo: https://zenodo.org/records/10802427
November 8, 2024 at 7:25 PM
The paper can be found on our homepage (https://webis.de/publications.html#schmidt_2024) and the dataset is on Zenodo: https://zenodo.org/records/10802427
In our experiments, LLMs struggle with the task in a zero-shot setting, especially due to low precision values. Sentence transformers, however, can be finetuned to successfully detect the inserted ads and achieve precision and recall values of above 0.9 for unseen meta topics. https://t.co/VuuaW...
November 8, 2024 at 7:25 PM
In our experiments, LLMs struggle with the task in a zero-shot setting, especially due to low precision values. Sentence transformers, however, can be finetuned to successfully detect the inserted ads and achieve precision and recall values of above 0.9 for unseen meta topics. https://t.co/VuuaW...
The Webis Generated Native Ads 2024 is the first public dataset to evaluate models on the task of detecting ads in responses of conversational search engines.
It was created by simulating an advertising service for queries from popular meta topics (product/service categories). https://t.co/pjHr...
It was created by simulating an advertising service for queries from popular meta topics (product/service categories). https://t.co/pjHr...
November 8, 2024 at 7:25 PM
The Webis Generated Native Ads 2024 is the first public dataset to evaluate models on the task of detecting ads in responses of conversational search engines.
It was created by simulating an advertising service for queries from popular meta topics (product/service categories). https://t.co/pjHr...
It was created by simulating an advertising service for queries from popular meta topics (product/service categories). https://t.co/pjHr...
What if conversational search will be financed by inserting ads directly into generated responses? We present our work on detecting these generated native ads at #TheWebConf24.
Come visit us at the short paper poster session on Thursday in the Central Ballroom. https://t.co/NRKbal57WO
Come visit us at the short paper poster session on Thursday in the Central Ballroom. https://t.co/NRKbal57WO
November 8, 2024 at 7:25 PM
What if conversational search will be financed by inserting ads directly into generated responses? We present our work on detecting these generated native ads at #TheWebConf24.
Come visit us at the short paper poster session on Thursday in the Central Ballroom. https://t.co/NRKbal57WO
Come visit us at the short paper poster session on Thursday in the Central Ballroom. https://t.co/NRKbal57WO
Right now, we will start the second half of the SCAI'24 workshop at #CHIIR2024 in hybrid mode. We will move from the big ideas and human-centered metrics to the challenges of human-in-the-loop evaluations. https://x.com/webis_de/status/1768277930768015536/photo/1
November 8, 2024 at 7:25 PM
Right now, we will start the second half of the SCAI'24 workshop at #CHIIR2024 in hybrid mode. We will move from the big ideas and human-centered metrics to the challenges of human-in-the-loop evaluations. https://x.com/webis_de/status/1768277930768015536/photo/1
Here's a study we did together with social scientists Arno Simons and Marion Schmidt on who Wikipedia editors consider notable enough to be mentioned in the history section of the CRISPR article. An awesome collaboration! https://twitter.com/WikiResearch/status/1766894251588325491
November 8, 2024 at 7:25 PM
Here's a study we did together with social scientists Arno Simons and Marion Schmidt on who Wikipedia editors consider notable enough to be mentioned in the history section of the CRISPR article. An awesome collaboration! https://twitter.com/WikiResearch/status/1766894251588325491
How will conversational search AI pay for itself?
It may be native ads or product placement in generated answers. At #CHIIR2024 next week, we'll present a user study showing that many people don't recognize ads inserted by LLMs in generated search results: https://t.co/hrZE9moeKy https://t.co/qg...
It may be native ads or product placement in generated answers. At #CHIIR2024 next week, we'll present a user study showing that many people don't recognize ads inserted by LLMs in generated search results: https://t.co/hrZE9moeKy https://t.co/qg...
November 8, 2024 at 7:25 PM
How will conversational search AI pay for itself?
It may be native ads or product placement in generated answers. At #CHIIR2024 next week, we'll present a user study showing that many people don't recognize ads inserted by LLMs in generated search results: https://t.co/hrZE9moeKy https://t.co/qg...
It may be native ads or product placement in generated answers. At #CHIIR2024 next week, we'll present a user study showing that many people don't recognize ads inserted by LLMs in generated search results: https://t.co/hrZE9moeKy https://t.co/qg...
Working in Argumentation? Time to participate in Touché 2024!
Three shared tasks:
- Human Value Detection
- Ideology and Power Identification in Parliamentary Debates
- Image Retrieval/Generation for Arguments
Submission deadline is May 6th!
More info: https://t.co/rtgSDxpDTx https://t.co/S6Kl...
Three shared tasks:
- Human Value Detection
- Ideology and Power Identification in Parliamentary Debates
- Image Retrieval/Generation for Arguments
Submission deadline is May 6th!
More info: https://t.co/rtgSDxpDTx https://t.co/S6Kl...
November 8, 2024 at 7:24 PM
Working in Argumentation? Time to participate in Touché 2024!
Three shared tasks:
- Human Value Detection
- Ideology and Power Identification in Parliamentary Debates
- Image Retrieval/Generation for Arguments
Submission deadline is May 6th!
More info: https://t.co/rtgSDxpDTx https://t.co/S6Kl...
Three shared tasks:
- Human Value Detection
- Ideology and Power Identification in Parliamentary Debates
- Image Retrieval/Generation for Arguments
Submission deadline is May 6th!
More info: https://t.co/rtgSDxpDTx https://t.co/S6Kl...