Lightnews — Scholar-powered news

UKP Lab

@ukplab.bsky.social

It touches on fast-moving model releases, the growing relevance of AI agents and tool-use, the shift towards reasoning-oriented training, and the limits of what benchmark results really tell us.

🔗 Read the article (F+ / paywalled):
www.faz.net/aktuell/wirt...

(2/2)

Sprachmodelle: Wie weit die Künstliche Intelligenz wirklich ist

ChatGPT löste den bislang größten KI-Boom aus. Doch wie weit sind die Sprachmodelle inzwischen? Und wie geht es weiter? Ein Überblick.

www.faz.net

November 7, 2025 at 9:05 AM

UKP Lab

@ukplab.bsky.social

And consider following the authors Sheng Lu, @ikuznetsov.bsky.social, @igurevych.bsky.social (all @ukplab.bsky.social/ @tuda.bsky.social.

See you at the Hashtag#EMNLP conference in Suzhou 🏯

(5/5)

#EMNLP2025 #UKPLab #PeerReview #LLM #AIResearch #NLProc

November 4, 2025 at 10:49 AM

UKP Lab

@ukplab.bsky.social

This work lays the groundwork for aspect-aware review analysis, improving review comparison, meta-reviewing, and even AI-generated review detection.

📄 𝗣𝗮𝗽𝗲𝗿: arxiv.org/abs/2504.06910
💾 𝗖𝗼𝗱𝗲 & 𝗗𝗮𝘁𝗮: github.com/UKPLab/emnlp2025-aspects-in-reviews

(4/🧵)

Identifying Aspects in Peer Reviews

Peer review is central to academic publishing, but the growing volume of submissions is straining the process. This motivates the development of computational approaches to support peer review. While ...

arxiv.org

November 4, 2025 at 10:49 AM

UKP Lab

@ukplab.bsky.social

📊 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀
🗂 Released a dataset of reviews annotated with aspect information
👥 Reviewer focus varies depending on the submission track
🤖 LLM-generated reviews tend to be more homogeneous in aspect coverage

(3/🧵)

November 4, 2025 at 10:49 AM

UKP Lab

@ukplab.bsky.social

🔍 𝗪𝗵𝗮𝘁 𝘄𝗲 𝗱𝗶𝗱
🧩 Used GPT-4o to extract aspects from 1,094 reviews across 350 papers
🧠 Clustered and refined the outputs to form a three-level taxonomy of review aspects
🧭 Used the taxonomy to explore downstream applications

(2/🧵)

November 4, 2025 at 10:49 AM

UKP Lab

@ukplab.bsky.social

Be sure to follow the authors: Dominic Petrak, Thy Thy Tran, and @igurevych.bsky.social from @ukplab.bsky.social/@tuda.bsky.social.

See you at the #EMNLP in Suzhou!

(4/4)

#NLProc #ConversationalAI #Agents #EMNLP2025

November 3, 2025 at 7:31 AM

UKP Lab

@ukplab.bsky.social

📄 𝗣𝗮𝗽𝗲𝗿: www.arxiv.org/abs/2509.10833

💻 𝗖𝗼𝗱𝗲: github.com/UKPLab/emnlp...

🔗 𝗣𝗿𝗼𝗷𝗲𝗰𝘁: ukplab.github.io/emnlp2025-au...

(3/🧵)

Towards Automated Error Discovery: A Study in Conversational AI

Although LLM-based conversational agents demonstrate strong fluency and coherence, they still produce undesirable behaviors (errors) that are challenging to prevent from reaching users during deployme...

www.arxiv.org

November 3, 2025 at 7:31 AM

UKP Lab

@ukplab.bsky.social

⚙️ By combining 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁 𝗲𝗻𝗰𝗼𝗱𝗲𝗿𝘀 with a 𝗻𝗼𝘃𝗲𝗹 𝘀𝗮𝗺𝗽𝗹𝗶𝗻𝗴 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗳𝗼𝗿 𝗰𝗼𝗻𝘁𝗿𝗮𝘀𝘁𝗶𝘃𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, it improves representation learning and uncovers 𝗰𝗼𝗵𝗲𝗿𝗲𝗻𝘁 𝗲𝗿𝗿𝗼𝗿 𝗰𝗮𝘁𝗲𝗴𝗼𝗿𝗶𝗲𝘀.

📊 𝗦𝗘𝗘𝗘𝗗 outperforms #GPT-4o and #Phi-4 by up to +𝟴 𝗽𝗽 across multiple datasets.

(2/🧵)

November 3, 2025 at 7:31 AM

UKP Lab

@ukplab.bsky.social

And consider following the authors Haishuo Fang (@ukplab.bsky.social), Xiaodan Zhu (@queensuglobal.bsky.social), and @igurevych.bsky.social (@ukplab.bsky.social/@athenecenter.bsky.social) if you are interested in more information or an exchange of ideas.

(4/4)

#NLProc #AI #EMNLP2025 #LLMAgent

October 31, 2025 at 9:47 AM

UKP Lab

@ukplab.bsky.social

📜 Paper: arxiv.org/abs/2407.11843
🌐 Project: ukplab.github.io/emnlp2025-in...
💾 Code + data: github.com/UKPLab/emnlp...

(3/🧵)

Preemptive Detection and Correction of Misaligned Actions in LLM Agents

Deploying LLM-based agents in real-life applications often faces a critical challenge: the misalignment between agents' behavior and user intent. Such misalignment may lead agents to unintentionally e...

arxiv.org

October 31, 2025 at 9:47 AM

UKP Lab

@ukplab.bsky.social

𝗞𝗲𝘆 𝗙𝗶𝗻𝗱𝗶𝗻𝗴𝘀:
1️⃣ Up to +𝟮𝟬% 𝗠𝗮𝗰𝗿𝗼-𝗙𝟭 improvement over baseline detectors in identifying misaligned actions early.

2️⃣ 𝗜𝗻𝗳𝗲𝗿𝗔𝗰𝘁 + 𝗵𝘂𝗺𝗮𝗻 𝗰𝗼𝗹𝗹𝗮𝗯𝗼𝗿𝗮𝘁𝗶𝗼𝗻 yields near-human reliability, only 𝟯.𝟱% 𝗯𝗲𝗵𝗶𝗻𝗱 𝗺𝗮𝗻𝘂𝗮𝗹 𝘃𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻, while cutting 𝗮𝗻𝗻𝗼𝘁𝗮𝘁𝗶𝗼𝗻 𝗰𝗼𝘀𝘁𝘀 𝗯𝘆 𝟱𝟬%.

(2/🧵)

October 31, 2025 at 9:47 AM

UKP Lab

@ukplab.bsky.social

While LLMs cannot think for us, they can definitely help us to think better.

🔗 Learn more about the event: www.daad.de/de/der-daad/...

(5/5)

#AI #Reasoning #CognitiveScience #ZuseSchools #ELIZA #UKPLab #TUDarmstadt

Herbstevent 2025

Zuse Schools Herbstevent 2025

www.daad.de

October 30, 2025 at 1:53 PM

UKP Lab

@ukplab.bsky.social

In her statements, Iryna has highlighted the limits of current LLMs in achieving human-like reasoning and emphasized the importance of systematic generalization, introspective consistency and interdisciplinary collaboration for future progress.

(4/🧵)

October 30, 2025 at 1:53 PM

UKP Lab

@ukplab.bsky.social

Panellists:
Prof. Dr. @igurevych.bsky.social, @tuda.bsky.social
Dr. Letitia Parcalabescu, Aleph Alpha Research
Prof. Dr. @marctoussaint.bsky.social, @tuberlin.bsky.social
Prof. Dr. Volker Tresp, @lmumuenchen.bsky.social

(3/🧵)

October 30, 2025 at 1:53 PM

UKP Lab

@ukplab.bsky.social

The discussion brought together experts from AI and Cognitive Science to explore a key question:
👉 𝘛𝘰 𝘸𝘩𝘢𝘵 𝘦𝘹𝘵𝘦𝘯𝘵 𝘤𝘢𝘯 𝘮𝘢𝘤𝘩𝘪𝘯𝘦-𝘣𝘢𝘴𝘦𝘥 𝘱𝘳𝘰𝘤𝘦𝘴𝘴𝘦𝘴 𝘣𝘦 𝘤𝘰𝘯𝘴𝘪𝘥𝘦𝘳𝘦𝘥 𝘳𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨, 𝘨𝘪𝘷𝘦𝘯 𝘵𝘩𝘦𝘪𝘳 𝘧𝘶𝘯𝘥𝘢𝘮𝘦𝘯𝘵𝘢𝘭 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘤𝘦𝘴 𝘧𝘳𝘰𝘮 𝘩𝘶𝘮𝘢𝘯 𝘤𝘰𝘨𝘯𝘪𝘵𝘪𝘰𝘯?

(2/🧵)

October 30, 2025 at 1:53 PM

UKP Lab

@ukplab.bsky.social

And consider following the authors @rachneet.bsky.social‬, Rima Hazra, and @igurevych.bsky.social (@ukplab.bsky.social/@tuda.bsky.social) if you are interested in more information or an exchange of ideas.

(6/6)

#NLProc #LLMSafety #AIsecurity #Jailbreak #LLM

October 30, 2025 at 8:43 AM

UKP Lab

@ukplab.bsky.social

📜 𝗣𝗮𝗽𝗲𝗿 → arxiv.org/pdf/2501.01872
🌐 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 → ukplab.github.io/emnlp2025-po...
💾 𝗖𝗼𝗱𝗲 + 𝗱𝗮𝘁𝗮 → github.com/UKPLab/emnlp...

(5/🧵)

arxiv.org

October 30, 2025 at 8:43 AM

UKP Lab

@ukplab.bsky.social

🚨 𝗗𝗲𝗳𝗲𝗻𝘀𝗲 𝗴𝗮𝗽 𝗲𝘅𝗽𝗼𝘀𝗲𝗱: Current safety measures can't detect subtle, logic-driven jailbreaks
✅ 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗲𝘅𝗶𝘀𝘁𝘀: Our Chain-of-Thought defenses reduce attack success by 95%

(4/🧵)

October 30, 2025 at 8:43 AM

UKP Lab

@ukplab.bsky.social

𝗞𝗲𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀:
⚡ 𝟱𝟳% 𝗮𝘁𝘁𝗮𝗰𝗸 𝘀𝘂𝗰𝗰𝗲𝘀𝘀 𝗿𝗮𝘁𝗲: Outperforms SOTA attacks across GPT-4o, LLama, Gemma, and Phi models
🧠 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 ≠ 𝗦𝗮𝗳𝗲𝗿: Larger, more capable models are MORE vulnerable to contrastive reasoning attacks

(3/🧵)

October 30, 2025 at 8:43 AM

UKP Lab

@ukplab.bsky.social

🧩 Instead of submitting harmful prompts, POATE generates their harmless opposites—then subtly guides the model’s reasoning back to unsafe intent. The result: logic itself becomes the attack vector.

(2/🧵)

October 30, 2025 at 8:43 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news