Lightnews — Scholar-powered news

Reposted by Elisabeth Fittschen

Krithika Ramesh

@stolenpyjak.bsky.social

SynthTextEval was developed in close collaboration with
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social

Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!

GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)

SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval

github.com

November 7, 2025 at 12:53 AM

Reposted by Elisabeth Fittschen

Krithika Ramesh

@stolenpyjak.bsky.social

🚀 SynthTextEval, our open-source toolkit for generating and evaluating synthetic text data for high-stakes domains, will be featured at EMNLP 2025 as a system demonstration!

GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...

#EMNLP2025 #EMNLP #SyntheticData

GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)

SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval

github.com

November 7, 2025 at 12:53 AM

Reposted by Elisabeth Fittschen

Niyati Bafna

@niyatibafna.bsky.social

We know that speech LID systems flunk on accented speech. But why? And what can we do about it? 🤔
Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.

June 7, 2025 at 5:27 PM

Reposted by Elisabeth Fittschen

Ted Underwood

@tedunderwood.com

New preprint from @lauraknelson.bsky.social, @mattwilkens.bsky.social, and myself tests different ways of simulating the past with LLMs. We don't fully answer the title question here—just show that simple strategies based on prompting and fine-tuning are insufficient. +

Can Language Models Represent the Past without Anachronism?

Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not pro...

arxiv.org

May 2, 2025 at 12:47 PM

Reposted by Elisabeth Fittschen

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

How should the humanities leverage LLMs?
▶️Domain-specific pretraining!

Pretraining models can be a research tool, it's cheaper than LoRA, and allows studying
💠grammatical change
💠emergent word senses
💠who knows what more…

Train on your data with our pipeline or use ours!
#AI #LLM 🤖📈

April 15, 2025 at 12:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news