Lightnews — Scholar-powered news

Yekyung Kim

@yekyung.bsky.social

39 followers 110 following 8 posts

PhD student @ UMass NLP

Posts Replies Media Videos

Yekyung Kim

@yekyung.bsky.social

Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers?

We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!

Our analysis across 26 languages 🧵👇

March 5, 2025 at 5:06 PM

Reposted by Yekyung Kim

Abhilasha Ravichander

@lasha.bsky.social

✨I am on the faculty job market in the 2024-2025 cycle!✨

My research centers on advancing Responsible AI, specifically enhancing factuality, robustness, and transparency in AI systems.

If you have relevant positions, let me know! lasharavichander.github.io Please share/RT!

Abhilasha Ravichander - Home

lasharavichander.github.io

November 11, 2024 at 2:23 PM

Reposted by Yekyung Kim

Chau Minh Pham

@chautmpham.bsky.social

Long-form text generation with multiple stylistic and semantic constraints remains largely unexplored.

We present Suri 🦙: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints.

📎 arxiv.org/abs/2406.19371

November 11, 2024 at 12:41 PM

Reposted by Yekyung Kim

Marzena Karpinska

@markar.bsky.social

I really wanted to run NEW #nocha benchmark claims on #o1 but it won't behave 😠
- 6k reasoning tokens is often not enough to get an ans and more means being able to process only short books
- OpenAI adds sth to the prompt: ~8k extra tokens-> less room for book+reason+generation!

Image showing prompt token count as per the tokenizer (tiktoken) which is 117,609 tokens, and as per what openai API claims it to be, which is 125,385 tokens. There is about 7000 extra tokens added coming from who knows where.

November 11, 2024 at 5:11 PM

Reposted by Yekyung Kim

Yapei Chang

@yapeichang.bsky.social

🌊Heading to #EMNLP2024 tmr, presenting PostMark on Tue. morning! 🔗 arxiv.org/abs/2406.14517

Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on

Also, I'm on the lookout for a summer 2025 internship!

November 10, 2024 at 7:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news