UKP Lab
banner
ukplab.bsky.social
UKP Lab
@ukplab.bsky.social
The Ubiquitous Knowledge Processing Lab researches Natural Language Processing (#NLProc) with a strong emphasis on Large Language Models, Conversational AI & Question Answering | @cs-tudarmstadt.bsky.social · @TUDa.bsky.social

https://www.ukp.tu-darmstadt
It touches on fast-moving model releases, the growing relevance of AI agents and tool-use, the shift towards reasoning-oriented training, and the limits of what benchmark results really tell us.

🔗 Read the article (F+ / paywalled):
www.faz.net/aktuell/wirt...

(2/2)
Sprachmodelle: Wie weit die Künstliche Intelligenz wirklich ist
ChatGPT löste den bislang größten KI-Boom aus. Doch wie weit sind die Sprachmodelle inzwischen? Und wie geht es weiter? Ein Überblick.
www.faz.net
November 7, 2025 at 9:05 AM
And consider following the authors Sheng Lu, @ikuznetsov.bsky.social, @igurevych.bsky.social (all @ukplab.bsky.social/ @tuda.bsky.social.

See you at the Hashtag#EMNLP conference in Suzhou 🏯

(5/5)

#EMNLP2025 #UKPLab #PeerReview #LLM #AIResearch #NLProc
November 4, 2025 at 10:49 AM
This work lays the groundwork for aspect-aware review analysis, improving review comparison, meta-reviewing, and even AI-generated review detection.

📄 𝗣𝗮𝗽𝗲𝗿: arxiv.org/abs/2504.06910
💾 𝗖𝗼𝗱𝗲 & 𝗗𝗮𝘁𝗮: github.com/UKPLab/emnlp2025-aspects-in-reviews

(4/🧵)
Identifying Aspects in Peer Reviews
Peer review is central to academic publishing, but the growing volume of submissions is straining the process. This motivates the development of computational approaches to support peer review. While ...
arxiv.org
November 4, 2025 at 10:49 AM
📊 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀
🗂 Released a dataset of reviews annotated with aspect information
👥 Reviewer focus varies depending on the submission track
🤖 LLM-generated reviews tend to be more homogeneous in aspect coverage

(3/🧵)
November 4, 2025 at 10:49 AM
🔍 𝗪𝗵𝗮𝘁 𝘄𝗲 𝗱𝗶𝗱
🧩 Used GPT-4o to extract aspects from 1,094 reviews across 350 papers
🧠 Clustered and refined the outputs to form a three-level taxonomy of review aspects
🧭 Used the taxonomy to explore downstream applications

(2/🧵)
November 4, 2025 at 10:49 AM
Be sure to follow the authors: Dominic Petrak, Thy Thy Tran, and @igurevych.bsky.social from @ukplab.bsky.social/@tuda.bsky.social.

See you at the #EMNLP in Suzhou!

(4/4)

#NLProc #ConversationalAI #Agents #EMNLP2025
November 3, 2025 at 7:31 AM
⚙️ By combining 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁 𝗲𝗻𝗰𝗼𝗱𝗲𝗿𝘀 with a 𝗻𝗼𝘃𝗲𝗹 𝘀𝗮𝗺𝗽𝗹𝗶𝗻𝗴 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗳𝗼𝗿 𝗰𝗼𝗻𝘁𝗿𝗮𝘀𝘁𝗶𝘃𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, it improves representation learning and uncovers 𝗰𝗼𝗵𝗲𝗿𝗲𝗻𝘁 𝗲𝗿𝗿𝗼𝗿 𝗰𝗮𝘁𝗲𝗴𝗼𝗿𝗶𝗲𝘀.

📊 𝗦𝗘𝗘𝗘𝗗 outperforms #GPT-4o and #Phi-4 by up to +𝟴 𝗽𝗽 across multiple datasets.

(2/🧵)
November 3, 2025 at 7:31 AM
And consider following the authors Haishuo Fang (@ukplab.bsky.social), Xiaodan Zhu (@queensuglobal.bsky.social), and @igurevych.bsky.social (@ukplab.bsky.social/@athenecenter.bsky.social) if you are interested in more information or an exchange of ideas.

(4/4)

#NLProc #AI #EMNLP2025 #LLMAgent
October 31, 2025 at 9:47 AM
𝗞𝗲𝘆 𝗙𝗶𝗻𝗱𝗶𝗻𝗴𝘀:
1️⃣ Up to +𝟮𝟬% 𝗠𝗮𝗰𝗿𝗼-𝗙𝟭 improvement over baseline detectors in identifying misaligned actions early.

2️⃣ 𝗜𝗻𝗳𝗲𝗿𝗔𝗰𝘁 + 𝗵𝘂𝗺𝗮𝗻 𝗰𝗼𝗹𝗹𝗮𝗯𝗼𝗿𝗮𝘁𝗶𝗼𝗻 yields near-human reliability, only 𝟯.𝟱% 𝗯𝗲𝗵𝗶𝗻𝗱 𝗺𝗮𝗻𝘂𝗮𝗹 𝘃𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻, while cutting 𝗮𝗻𝗻𝗼𝘁𝗮𝘁𝗶𝗼𝗻 𝗰𝗼𝘀𝘁𝘀 𝗯𝘆 𝟱𝟬%.

(2/🧵)
October 31, 2025 at 9:47 AM
While LLMs cannot think for us, they can definitely help us to think better.

🔗 Learn more about the event: www.daad.de/de/der-daad/...

(5/5)

#AI #Reasoning #CognitiveScience #ZuseSchools #ELIZA #UKPLab #TUDarmstadt
Herbstevent 2025
Zuse Schools Herbstevent 2025
www.daad.de
October 30, 2025 at 1:53 PM
In her statements, Iryna has highlighted the limits of current LLMs in achieving human-like reasoning and emphasized the importance of systematic generalization, introspective consistency and interdisciplinary collaboration for future progress.

(4/🧵)
October 30, 2025 at 1:53 PM
Panellists:
Prof. Dr. @igurevych.bsky.social, @tuda.bsky.social
Dr. Letitia Parcalabescu, Aleph Alpha Research
Prof. Dr. @marctoussaint.bsky.social, @tuberlin.bsky.social
Prof. Dr. Volker Tresp, @lmumuenchen.bsky.social

(3/🧵)
October 30, 2025 at 1:53 PM
The discussion brought together experts from AI and Cognitive Science to explore a key question:
👉 𝘛𝘰 𝘸𝘩𝘢𝘵 𝘦𝘹𝘵𝘦𝘯𝘵 𝘤𝘢𝘯 𝘮𝘢𝘤𝘩𝘪𝘯𝘦-𝘣𝘢𝘴𝘦𝘥 𝘱𝘳𝘰𝘤𝘦𝘴𝘴𝘦𝘴 𝘣𝘦 𝘤𝘰𝘯𝘴𝘪𝘥𝘦𝘳𝘦𝘥 𝘳𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨, 𝘨𝘪𝘷𝘦𝘯 𝘵𝘩𝘦𝘪𝘳 𝘧𝘶𝘯𝘥𝘢𝘮𝘦𝘯𝘵𝘢𝘭 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘤𝘦𝘴 𝘧𝘳𝘰𝘮 𝘩𝘶𝘮𝘢𝘯 𝘤𝘰𝘨𝘯𝘪𝘵𝘪𝘰𝘯?

(2/🧵)
October 30, 2025 at 1:53 PM
And consider following the authors @rachneet.bsky.social‬, Rima Hazra, and @igurevych.bsky.social (@ukplab.bsky.social/@tuda.bsky.social) if you are interested in more information or an exchange of ideas.

(6/6)

#NLProc #LLMSafety #AIsecurity #Jailbreak #LLM
October 30, 2025 at 8:43 AM
📜 𝗣𝗮𝗽𝗲𝗿 → arxiv.org/pdf/2501.01872
🌐 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 → ukplab.github.io/emnlp2025-po...
💾 𝗖𝗼𝗱𝗲 + 𝗱𝗮𝘁𝗮 → github.com/UKPLab/emnlp...

(5/🧵)
arxiv.org
October 30, 2025 at 8:43 AM
🚨 𝗗𝗲𝗳𝗲𝗻𝘀𝗲 𝗴𝗮𝗽 𝗲𝘅𝗽𝗼𝘀𝗲𝗱: Current safety measures can't detect subtle, logic-driven jailbreaks
✅ 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗲𝘅𝗶𝘀𝘁𝘀: Our Chain-of-Thought defenses reduce attack success by 95%

(4/🧵)
October 30, 2025 at 8:43 AM
𝗞𝗲𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀:
⚡ 𝟱𝟳% 𝗮𝘁𝘁𝗮𝗰𝗸 𝘀𝘂𝗰𝗰𝗲𝘀𝘀 𝗿𝗮𝘁𝗲: Outperforms SOTA attacks across GPT-4o, LLama, Gemma, and Phi models
🧠 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 ≠ 𝗦𝗮𝗳𝗲𝗿: Larger, more capable models are MORE vulnerable to contrastive reasoning attacks

(3/🧵)
October 30, 2025 at 8:43 AM
🧩 Instead of submitting harmful prompts, POATE generates their harmless opposites—then subtly guides the model’s reasoning back to unsafe intent. The result: logic itself becomes the attack vector.

(2/🧵)
October 30, 2025 at 8:43 AM