Last but not least: FadeIT 🔍💬
Can your system spot fallacies in social media posts?
FadeIT focuses on Italian texts about migration, climate change & public health.
Detect flawed reasoning — where it spreads fastest.
#NLProc
Last but not least: FadeIT 🔍💬
Can your system spot fallacies in social media posts?
FadeIT focuses on Italian texts about migration, climate change & public health.
Detect flawed reasoning — where it spreads fastest.
#NLProc
Tenth up: PFB – Prometeia Financial Benchmark 💶📊
Can LLMs handle finance?
PFB evaluates open & closed models on domain-specific MCQs, with a twist: each question has a complexity score.
2 tasks: Italian and Multilingual QA
From GPT to finance pro.
#NLProc
Tenth up: PFB – Prometeia Financial Benchmark 💶📊
Can LLMs handle finance?
PFB evaluates open & closed models on domain-specific MCQs, with a twist: each question has a complexity score.
2 tasks: Italian and Multilingual QA
From GPT to finance pro.
#NLProc
Ninth up: Cruciverb-IT 🧩🇮🇹
Ready to crack some Italian crosswords?
Cruciverb-IT offers a challenging playground for NLP systems:
1️⃣ Answer clues from real crosswords
2️⃣ Autonomously solve full crossword grids
Wordplay meets AI.
#NLProc
Ninth up: Cruciverb-IT 🧩🇮🇹
Ready to crack some Italian crosswords?
Cruciverb-IT offers a challenging playground for NLP systems:
1️⃣ Answer clues from real crosswords
2️⃣ Autonomously solve full crossword grids
Wordplay meets AI.
#NLProc
Eighth up: SVELA 🧠🧽
Can LLMs forget on purpose?
SVELA tackles Machine Unlearning: design and evaluate metrics to verify if a model forgets specific knowledge while keeping the rest intact.
Selective forgetting, measurable impact.
#NLProc
Eighth up: SVELA 🧠🧽
Can LLMs forget on purpose?
SVELA tackles Machine Unlearning: design and evaluate metrics to verify if a model forgets specific knowledge while keeping the rest intact.
Selective forgetting, measurable impact.
#NLProc
Seventh up: MultiPRIDE 🏳️🌈🧠
Can a system tell when a slur is reclaimed?
💬 In this multilingual task (IT/ES/EN), classify whether LGBTQ+ terms in context are used with reclamatory intent.
It’s not just about words, it’s about meaning.
#NLProc
Seventh up: MultiPRIDE 🏳️🌈🧠
Can a system tell when a slur is reclaimed?
💬 In this multilingual task (IT/ES/EN), classify whether LGBTQ+ terms in context are used with reclamatory intent.
It’s not just about words, it’s about meaning.
#NLProc
Sixth up: DeSegMa-It 🤖📝
Can you spot the line between human and machine?
DeSegMa-It challenges systems to:
1️⃣ Detect machine-generated texts
2️⃣ Segment where the human ends & the machine begins
Human or AI? Let’s find out.
#NLProc
Sixth up: DeSegMa-It 🤖📝
Can you spot the line between human and machine?
DeSegMa-It challenges systems to:
1️⃣ Detect machine-generated texts
2️⃣ Segment where the human ends & the machine begins
Human or AI? Let’s find out.
#NLProc
Fifth up: Enhanced-VWSD 🖼️📚
Can you pick the right image for a word in context?
Given a sentence and 10 images, choose the one that best captures the meaning of a target word.
A vision meets language challenge!
#NLProc
Fifth up: Enhanced-VWSD 🖼️📚
Can you pick the right image for a word in context?
Given a sentence and 10 images, choose the one that best captures the meaning of a target word.
A vision meets language challenge!
#NLProc
Fourth up: IMPOLS 🗳️
Can systems detect what’s not said in political speech?
💬 IMPOLS targets implicit, questionable content that sounds true but isn’t explicit.
🔍 Tasks:
1️⃣ Detect implicit contents
2️⃣ Classify them
3️⃣ Classify implicatures
#NLProc
Fourth up: IMPOLS 🗳️
Can systems detect what’s not said in political speech?
💬 IMPOLS targets implicit, questionable content that sounds true but isn’t explicit.
🔍 Tasks:
1️⃣ Detect implicit contents
2️⃣ Classify them
3️⃣ Classify implicatures
#NLProc
Third up: ATE-IT 🏷️
Time to extract key concepts automatically, with the first large-scale eval of Automatic Term Extraction for Italian on institutional texts.
Subtasks:
🔹 Term Extraction
🔹 Term Variants Clustering
Let’s make terminology smarter.
#NLProc
Third up: ATE-IT 🏷️
Time to extract key concepts automatically, with the first large-scale eval of Automatic Term Extraction for Italian on institutional texts.
Subtasks:
🔹 Term Extraction
🔹 Term Variants Clustering
Let’s make terminology smarter.
#NLProc
Second up: GSI:detect
Can machines detect gender stereotypes in Italian texts?
🧠 Score sentences for stereotypical content
🏷️ Classify them into stereotype categories
From classification to social awareness.
#NLProc
Second up: GSI:detect
Can machines detect gender stereotypes in Italian texts?
🧠 Score sentences for stereotypical content
🏷️ Classify them into stereotype categories
From classification to social awareness.
#NLProc
First up: EXPLAINITA 🔍
Can you explain what a latent neuron means?
🧠 Describe Sparse Autoencoder latents
📝 Decide if a text activates a latent based on its explanation
From prediction to interpretability 💬
#NLProc
First up: EXPLAINITA 🔍
Can you explain what a latent neuron means?
🧠 Describe Sparse Autoencoder latents
📝 Decide if a text activates a latent based on its explanation
From prediction to interpretability 💬
#NLProc
@amsterdamnlp.bsky.social
@amsterdamnlp.bsky.social