We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!
Our analysis across 26 languages 🧵👇
We create ONERULER 💍, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!
Our analysis across 26 languages 🧵👇
My research centers on advancing Responsible AI, specifically enhancing factuality, robustness, and transparency in AI systems.
If you have relevant positions, let me know! lasharavichander.github.io Please share/RT!
My research centers on advancing Responsible AI, specifically enhancing factuality, robustness, and transparency in AI systems.
If you have relevant positions, let me know! lasharavichander.github.io Please share/RT!
We present Suri 🦙: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints.
📎 arxiv.org/abs/2406.19371
We present Suri 🦙: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints.
📎 arxiv.org/abs/2406.19371
- 6k reasoning tokens is often not enough to get an ans and more means being able to process only short books
- OpenAI adds sth to the prompt: ~8k extra tokens-> less room for book+reason+generation!
Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on
Also, I'm on the lookout for a summer 2025 internship!
Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on
Also, I'm on the lookout for a summer 2025 internship!