Lightnews — Scholar-powered news

Nils Feldhus

@nfel.bsky.social

I'm at #EMNLP2025 in Suzhou🇨🇳 to present these papers in the coming days:

Nov 7, Session 14, 12:30-13:30 @ Hall C – Multilingual Datasets for Custom Input Extraction and Explanation Requests Parsing in Conversational XAI Systems (Wang et al.) @qiaw99.bsky.social

🗞️ aclanthology.org/2025.finding...

November 6, 2025 at 7:00 AM

Nils Feldhus

@nfel.bsky.social

Our synthesis reveals a growing demand for more rigorous, causal evaluation. By outlining the state of the art and identifying key challenges, this survey provides a roadmap for future research toward making models more transparent.

This survey has been accepted at @blackboxnlp.bsky.social at EMNLP

Concept description evaluation techniques categorized by metric, study, and the underlying quality being measured. Metrics are grouped into conceptual families: predictive simulation, input-based evaluation, output-based evaluation, semantic similarity, and human judgment.

October 2, 2025 at 9:13 AM

Nils Feldhus

@nfel.bsky.social

We consider concept descriptions in open-vocabulary settings, the evolving landscape of automated and human metrics for evaluating them, and the datasets that underpin this research.

This is a companion paper to our PRISM paper that was accepted at NeurIPS last week: bsky.app/profile/lkop...

Concept description techniques categorized by component/abstraction (Neurons, SAEs, Circuits, Attention Heads), description source, and target dataset.

October 2, 2025 at 9:13 AM

Nils Feldhus

@nfel.bsky.social

🔍 Are you curious about uncovering the underlying mechanisms and identifying the roles of model components (neurons, …) and abstractions (SAEs, …)?

We provide the first survey of concept description generation and evaluation methods.

Joint effort w/ @lkopf.bsky.social

📄 arxiv.org/abs/2510.01048

Overview of descriptions for model components (neurons, attention heads) and model abstractions (SAE features, circuits).

October 2, 2025 at 9:13 AM

Nils Feldhus

@nfel.bsky.social

📍 Venue: Vietnam Institute for Advanced Study in Mathematics, Hanoi, Vietnam — during INLG 2025.

💡 If you'll be in the region for INLG or EMNLP 2025, this is a great opportunity to connect and share your work!

📅 Submission deadline: August 26, 2025

August 12, 2025 at 7:05 AM

Nils Feldhus

@nfel.bsky.social

Excited to announce the first-ever Workshop for Young Researchers in Natural Language Generation (YNLG), supported by @siggen.bsky.social, taking place on October 29, 2025 in Hanoi, Vietnam, co-located with INLG 2025.
Call for Submissions is out now!

ynlg-workshop.github.io

August 12, 2025 at 7:05 AM

Nils Feldhus

@nfel.bsky.social

We introduce the TableEval benchmark and investigate the effectiveness and robustness of text-based and multimodal LLMs on table understanding through a cross-domain & cross-modality evaluation.

Joint work by DFKI SLT incl. Fabio Barth, Raia Abu Ahmad, @malteos.bsky.social @pjox.bsky.social

July 26, 2025 at 9:37 AM

Nils Feldhus

@nfel.bsky.social

Our contribution to the FEVER shared task: Our EFC framework stays competitive against this year's baseline while significantly reducing the average runtime per claim achieved through semantic filtering strategies for veracity prediction.

Joint work by the XplaiNLP group incl. @jingyng.bsky.social

July 26, 2025 at 9:37 AM

Nils Feldhus

@nfel.bsky.social

We investigate how rationale generation is affected by readability level control, and find that explanations are adaptable, but the observed distinction between readability levels does not fully match the desired complexity.

Joint work with @hakimov.bsky.social.

July 26, 2025 at 9:37 AM

Nils Feldhus

@nfel.bsky.social

Using saliency scores, label flip verification and few-shot prompting, our FitCF method outperforms three state-of-the-art baselines on counterfactual example generation.

Joint work with @simost.bsky.social, Luis Felipe Villa-Arenas, Sebastian Möller, Vera Schmitt.

Code: github.com/qiaw99/FitCF

July 26, 2025 at 9:37 AM

Nils Feldhus

@nfel.bsky.social

Glad to announce our #FAccT2025 paper about gender bias in feature attribution methods, led by Mahdi Dhaini, will be presented tomorrow in 🇬🇷 Athens as part of the "Evaluating Explainable AI" session from 10:45 AM to 12:15 PM in Amphitheatre Ioannis Despotopoulos: programs.sigchi.org/facct/2025/p...

Overview of our experimental pipeline, exemplified with the GECO dataset (Wilming et al., 2024). We begin by obtaining predictions for male/female sentence pairs. We then use feature attribution methods to explain the predictions and evaluate the explanations using various metrics. We finally analyze the distributions of evaluation scores per each metric for male and female sentences and observe if the evaluations differ significantly between the two genders, indicating gender bias and disparity in explanations.

June 23, 2025 at 12:05 PM

Nils Feldhus

@nfel.bsky.social

Successfully defended my PhD yesterday! 🎓 🎉
Special thanks to my mentor Sebastian Möller and professors Sina Zarrieß @clausebielefeld.bsky.social, Christin Seifert, and @matthiasboehm7.bsky.social for being part of my committee.
Will continue working on XAI & NLP as a post-doc at TU Berlin & BIFOLD

April 12, 2025 at 4:29 PM

Nils Feldhus

@nfel.bsky.social

We evaluate Cross-Refine on three datasets incl. a bilingual dataset (English/German) and find that it outperforms the SOTA approach with smaller-sized LLMs, and that it can generate explanations in the target language more consistently.

GitHub: github.com/qiaw99/Cross...

January 3, 2025 at 3:01 PM

Nils Feldhus

@nfel.bsky.social

Our new COLING 2025-accepted work Cross-Refine, led by @qiaw99.bsky.social, approaches the problem of free-text rationalization with a generator-critic setup: The generator outputs initial explanations and the critic provides the feedback for them. #COLING2025

arXiv: arxiv.org/abs/2409.07123

Cross-Refine example of the question ``Where would you borrow coffee if you do not have any?'' from ECQA. The initial explanation by the generator has been accurately corrected and refined based on the feedback and explanations provided by the critic.

January 3, 2025 at 3:01 PM

Nils Feldhus

@nfel.bsky.social

We conduct an evaluation of several LLMs on CoXQL using different parsing strategies. Our parsing approach with template validations (MP+) surpasses the performance of previous approaches. We also discover that intents with multiple slots remain highly challenging for LLMs.

November 12, 2024 at 2:44 PM

Nils Feldhus

@nfel.bsky.social

We contribute to conversational explainable artificial intelligence (ConvXAI) systems based on large language models (LLMs) by filling the gap of training data and covering a broad range of XAI methods to map user requests onto.

November 12, 2024 at 2:44 PM

Nils Feldhus

@nfel.bsky.social

Accepted at #EMNLP2024: CoXQL, an NLP dataset for user intent recognition in Conversational XAI. Qianli Wang will present his work in Miami later today at 4:00PM local time in Poster Session C (Riverfront Hall).
ACL Anthology: aclanthology.org/2024.finding...
GitHub: github.com/DFKI-NLP/CoXQL

November 12, 2024 at 2:44 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news