Workshop on Multilingual Data Quality Signals
banner
wmdqs.bsky.social
Workshop on Multilingual Data Quality Signals
@wmdqs.bsky.social
The first iteration of our workshop will be co-located with @colmweb.org 2025 in Montreal.
https://wmdqs.org/
WMDQS is underway! Come join us in Room 520A at @colmweb.org! #COLM2025
October 10, 2025 at 4:18 PM
Reposted by Workshop on Multilingual Data Quality Signals
Looking forward to tomorrow's #COLM2025 workshop on multilingual data quality! 🤩
In collaboration with @commoncrawl.bsky.social, MLCommons, and @eleutherai.bsky.social, the first edition of WMDQS at @colmweb.org starts tomorrow in Room 520A! We have an updated schedule on our website, including a list of all accepted papers.
October 9, 2025 at 11:16 PM
In collaboration with @commoncrawl.bsky.social, MLCommons, and @eleutherai.bsky.social, the first edition of WMDQS at @colmweb.org starts tomorrow in Room 520A! We have an updated schedule on our website, including a list of all accepted papers.
October 9, 2025 at 8:17 PM
Reposted by Workshop on Multilingual Data Quality Signals
If you want to help us improve language and cultural coverage, and build an open source LangID system, please register to our shared task on Language Identification! 💬

Registering is easy! All the details are on the shared task webpage: wmdqs.org/shared-task/

Deadline: July 23, 2025 (AoE) ⏰
July 21, 2025 at 10:40 PM
Reposted by Workshop on Multilingual Data Quality Signals
The Common Crawl Foundation, MLCommons, EleutherAI, and John Hopkins' Center for Language and Speech Processing have the pleasure of inviting you to register for the 1st shared task on Language Identification for web data.

commoncrawl.org/blog/wmdqs-s...
Common Crawl - Blog - WMDQS Shared Task on Language Identification
The Common Crawl Foundation, MLCommons, EleutherAI, and John Hopkins' Center for Language and Speech Processing have the pleasure of inviting you to register for the 1st shared task on Language Identi...
commoncrawl.org
July 21, 2025 at 10:34 PM
We've added lots more documents/languages and extended the deadline for the first round of annotations until July 23rd. Check out the details below 👇
July 21, 2025 at 6:07 PM
Reposted by Workshop on Multilingual Data Quality Signals
One of the biggest obstacles to improving language technologies for low-resource languages is the lack of data. To address this, we need better language identification tools. So, we're organizing a shared task on Language Identification for Web Data! #NLP #NLProc
June 9, 2025 at 3:44 PM