Lightnews — Scholar-powered news

Niyati Bafna

@niyatibafna.bsky.social

Accepted at ACL main! Come chat about dialectal MT at our poster today at 4 pm.
Also, check out this largely bug-free package for generating your own synthetic dialectal data:
pypi.org/project/dial...

Niyati Bafna @niyatibafna.bsky.social · Feb 27

Dialects lie on continua of (structured) linguistic variation, right? And we can’t collect data for every point on the continuum...🤔
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581

July 29, 2025 at 12:14 PM

Reposted by Niyati Bafna

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅

We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️
(random is still a devilishly good baseline)

July 15, 2025 at 1:03 PM

Niyati Bafna

@niyatibafna.bsky.social

🔈When LLMs solve tasks with a mid-to-low resource input or target language, their output quality is poor. We know that. But can we put our finger on what breaks inside the LLM? We introduce the 💥 translation barrier hypothesis 💥 for failed multilingual generation with LLMs. arxiv.org/abs/2506.22724

July 4, 2025 at 5:05 PM

Niyati Bafna

@niyatibafna.bsky.social

We know that speech LID systems flunk on accented speech. But why? And what can we do about it? 🤔
Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.

June 7, 2025 at 5:27 PM

Niyati Bafna

@niyatibafna.bsky.social

Presented DialUp (MT, dialect continua, robustness, etc.; arxiv.org/abs/2501.16581) to some new people this week! Thanks Hale and @schmidtsciences.bsky.social for inviting me up to New York 🥯

Saw some magnolias too :)

April 11, 2025 at 12:50 AM

Niyati Bafna

@niyatibafna.bsky.social

Dialects lie on continua of (structured) linguistic variation, right? And we can’t collect data for every point on the continuum...🤔
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581

February 27, 2025 at 2:44 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news