Elisabeth Fittschen
efi3.bsky.social
Elisabeth Fittschen
@efi3.bsky.social
Reposted by Elisabeth Fittschen
SynthTextEval was developed in close collaboration with
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social

Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
github.com
November 7, 2025 at 12:53 AM
Reposted by Elisabeth Fittschen
🚀 SynthTextEval, our open-source toolkit for generating and evaluating synthetic text data for high-stakes domains, will be featured at EMNLP 2025 as a system demonstration!

GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...

#EMNLP2025 #EMNLP #SyntheticData
GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
github.com
November 7, 2025 at 12:53 AM
Reposted by Elisabeth Fittschen
We know that speech LID systems flunk on accented speech. But why? And what can we do about it? 🤔
Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.
June 7, 2025 at 5:27 PM
Reposted by Elisabeth Fittschen
New preprint from @lauraknelson.bsky.social, @mattwilkens.bsky.social, and myself tests different ways of simulating the past with LLMs. We don't fully answer the title question here—just show that simple strategies based on prompting and fine-tuning are insufficient. +
Can Language Models Represent the Past without Anachronism?
Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not pro...
arxiv.org
May 2, 2025 at 12:47 PM
Reposted by Elisabeth Fittschen
How should the humanities leverage LLMs?
▶️Domain-specific pretraining!

Pretraining models can be a research tool, it's cheaper than LoRA, and allows studying
💠grammatical change
💠emergent word senses
💠who knows what more…

Train on your data with our pipeline or use ours!
#AI #LLM 🤖📈
April 15, 2025 at 12:45 PM