Yekyung Kim
yekyung.bsky.social
Yekyung Kim
@yekyung.bsky.social
PhD student @ UMass NLP
Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers?

We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!

Our analysis across 26 languages 🧵👇
March 5, 2025 at 5:06 PM
Reposted by Yekyung Kim
✨I am on the faculty job market in the 2024-2025 cycle!✨

My research centers on advancing Responsible AI, specifically enhancing factuality, robustness, and transparency in AI systems.

If you have relevant positions, let me know! lasharavichander.github.io Please share/RT!
Abhilasha Ravichander - Home
lasharavichander.github.io
November 11, 2024 at 2:23 PM
Reposted by Yekyung Kim
Long-form text generation with multiple stylistic and semantic constraints remains largely unexplored.

We present Suri 🦙: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints.

📎 arxiv.org/abs/2406.19371
November 11, 2024 at 12:41 PM
Reposted by Yekyung Kim
I really wanted to run NEW #nocha benchmark claims on #o1 but it won't behave 😠
- 6k reasoning tokens is often not enough to get an ans and more means being able to process only short books
- OpenAI adds sth to the prompt: ~8k extra tokens-> less room for book+reason+generation!
November 11, 2024 at 5:11 PM
Reposted by Yekyung Kim
🌊Heading to #EMNLP2024 tmr, presenting PostMark on Tue. morning! 🔗 arxiv.org/abs/2406.14517

Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on

Also, I'm on the lookout for a summer 2025 internship!
November 10, 2024 at 7:35 PM