Krithika Ramesh
banner
stolenpyjak.bsky.social
Krithika Ramesh
@stolenpyjak.bsky.social
(she/her)
¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯
PhD student @jhuclsp | Prev @IndiaMSR
SynthTextEval was developed in close collaboration with
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social

Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
github.com
November 7, 2025 at 12:53 AM
SynthTextEval is a comprehensive toolkit for evaluating synthetic text data with a wide range of metrics, enabling standardized, comparable assessments of generation approaches and building greater confidence in the quality of synthetic data, especially for high-stakes domains
November 7, 2025 at 12:53 AM
Synthetic data shouldn’t be a black box - we make it easier to examine and identify issues in synthetic data outputs with
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
November 7, 2025 at 12:53 AM
SynthTextEval also supports fine-tuning models for controllable text generation across diverse domains, which allows users to
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
November 7, 2025 at 12:53 AM
🔧Utility: Downstream task-based evaluations (classification, coreference resolution)
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
November 7, 2025 at 12:53 AM
Conventional metrics like BLEU, ROUGE, or perplexity only scratch the surface of synthetic text quality!

Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
November 7, 2025 at 12:53 AM
Reposted by Krithika Ramesh
Reposted by Krithika Ramesh
This hypothesis says that 1) Multilingual generation uses a model-internal task-solving→translation cascade. 2) Failure of the translation stage *despite task-solving success* is a large part of the problem. That is, the model often solves the task but fails to articulate the answer.
July 4, 2025 at 5:05 PM
Reposted by Krithika Ramesh
Go find new linguidtic changes, compare corpora and invent
huggingface.co/Hplm
arxiv.org/abs/2504.05523
Hplm (Historical Perspectival LM)
Org profile for Historical Perspectival LM on Hugging Face, the AI community building the future.
huggingface.co
April 15, 2025 at 12:45 PM
Reposted by Krithika Ramesh
Historical analysis is a good example, as historical periods can get lost in blended information from different eras. Finetuning large models isn't enough, they “leak” future/modern concepts, making historical analysis impossible. Did you know cars existed in the 1800s? 🤦
April 15, 2025 at 12:45 PM
Reposted by Krithika Ramesh
arxiv.org/abs/2504.05523

Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Pretraining Language Models for Diachronic Linguistic Change Discovery
Large language models (LLMs) have shown potential as tools for scientific discovery. This has engendered growing interest in their use in humanistic disciplines, such as historical linguistics and lit...
arxiv.org
April 15, 2025 at 12:45 PM