Krithika Ramesh
banner
stolenpyjak.bsky.social
Krithika Ramesh
@stolenpyjak.bsky.social
(she/her)
¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯
PhD student @jhuclsp | Prev @IndiaMSR
Catch @zihaozhao.bsky.social at today’s poster session (10:30–12) where he'll be presenting SynthTextEval! Stop by if you're interested in synthetic text for high-stakes domains. Zihao also has another EMNLP paper on private text generation, for people interested in this space!
@jhuclsp.bsky.social
November 7, 2025 at 12:55 AM
🚀 SynthTextEval, our open-source toolkit for generating and evaluating synthetic text data for high-stakes domains, will be featured at EMNLP 2025 as a system demonstration!

GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...

#EMNLP2025 #EMNLP #SyntheticData
GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
github.com
November 7, 2025 at 12:53 AM
Reposted by Krithika Ramesh
Take a look at this EMNLP 2025 paper by @zihaozhao.bsky.social, which proposes novel methods for generating high utility, privacy-preserving synthetic text!
🚀 Text anonymization is hard; DP often hurts utility.
We use entity-aware control codes + either ICL (with bad-token blocking) or prefix-tuning w/ masking to get strong privacy–utility tradeoffs on legal & clinical data, outperforming DP-SGD in practice (EMNLP 2025).
www.arxiv.org/abs/2509.25729
October 16, 2025 at 2:39 AM
‼️‼️
🔈When LLMs solve tasks with a mid-to-low resource input or target language, their output quality is poor. We know that. But can we put our finger on what breaks inside the LLM? We introduce the 💥 translation barrier hypothesis 💥 for failed multilingual generation with LLMs. arxiv.org/abs/2506.22724
July 8, 2025 at 4:04 PM
Reposted by Krithika Ramesh
This hypothesis says that 1) Multilingual generation uses a model-internal task-solving→translation cascade. 2) Failure of the translation stage *despite task-solving success* is a large part of the problem. That is, the model often solves the task but fails to articulate the answer.
July 4, 2025 at 5:05 PM
⁉️
What happens when an LLM is asked to use information that contradicts its knowledge? We explore knowledge conflict in a new preprint📑
TLDR: Performance drops, and this could affect the overall performance of LLMs in model-based evaluation.📑🧵⬇️ 1/8
#NLProc #LLM #AIResearch
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models
Large language models frequently rely on both contextual input and parametric knowledge to perform tasks. However, these sources can come into conflict, especially when retrieved documents contradict…
arxiv.org
June 18, 2025 at 2:09 AM
Reposted by Krithika Ramesh
We know that speech LID systems flunk on accented speech. But why? And what can we do about it? 🤔
Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.
June 7, 2025 at 5:27 PM
Reposted by Krithika Ramesh
Go find new linguidtic changes, compare corpora and invent
huggingface.co/Hplm
arxiv.org/abs/2504.05523
Hplm (Historical Perspectival LM)
Org profile for Historical Perspectival LM on Hugging Face, the AI community building the future.
huggingface.co
April 15, 2025 at 12:45 PM
Reposted by Krithika Ramesh
Historical analysis is a good example, as historical periods can get lost in blended information from different eras. Finetuning large models isn't enough, they “leak” future/modern concepts, making historical analysis impossible. Did you know cars existed in the 1800s? 🤦
April 15, 2025 at 12:45 PM
Reposted by Krithika Ramesh
arxiv.org/abs/2504.05523

Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Pretraining Language Models for Diachronic Linguistic Change Discovery
Large language models (LLMs) have shown potential as tools for scientific discovery. This has engendered growing interest in their use in humanistic disciplines, such as historical linguistics and lit...
arxiv.org
April 15, 2025 at 12:45 PM
Reposted by Krithika Ramesh
How should the humanities leverage LLMs?
▶️Domain-specific pretraining!

Pretraining models can be a research tool, it's cheaper than LoRA, and allows studying
💠grammatical change
💠emergent word senses
💠who knows what more…

Train on your data with our pipeline or use ours!
#AI #LLM 🤖📈
April 15, 2025 at 12:45 PM
Reposted by Krithika Ramesh
Dialects lie on continua of (structured) linguistic variation, right? And we can’t collect data for every point on the continuum...🤔
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
February 27, 2025 at 2:44 AM
Reposted by Krithika Ramesh
📢 Want to host MASC 2025?

The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities and industry in the Mid-Atlantic.

Please submit this very short form if you are interested in hosting! Deadline January 6th. #MASC2025
December 16, 2024 at 9:19 PM
Reposted by Krithika Ramesh
📢 It's PhD admissions season! 🎓

The PhD admissions process is stressful! 😅

Want a behind-the-scenes look at the process? 👀✨ You have questions, we have answers. 📝🤝

Watch my Admissions AMA for @jhuclsp.

https://youtu.be/YlwpIPFNXjo?si=O7n5QwGT5sQdpg7u
December 1, 2024 at 11:02 PM
Reposted by Krithika Ramesh
I'm super excited about this program and happy to connect if you're interested in working with me through it!
Postdoc opportunities! The Johns Hopkins Data Science and AI Institute has a new postdoc program!

We’re looking for candidates across data science and AI, including science, health, medicine, the humanities, engineering, policy, and ethics.

Spread the word and apply!

ai.jhu.edu/postdoctoral...
Postdoctoral Fellowship Program - Johns Hopkins Data Science and AI Institute
Data Science and AI Institute Postdoctoral Fellowship Program The Johns Hopkins Data Science and AI Institute welcomes applications for its postdoctoral fellowship program, seeking scholars to advance...
ai.jhu.edu
November 20, 2024 at 7:28 PM
Reposted by Krithika Ramesh
Putting together a JHU Center for Language and Speech Processing starter pack!

Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.

go.bsky.app/JtWKca2
CLSP
Join the conversation
go.bsky.app
November 19, 2024 at 3:37 PM