Lightnews — Scholar-powered news

Reposted by Joshua Ong

Aryo Pradipta Gema

@aryopg.bsky.social

MMLU-Redux just touched down at #NAACL2025! 🎉
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅
If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

May 2, 2025 at 1:00 PM

Reposted by Joshua Ong

Pasquale Minervini

@neuralnoise.com

Thanks @nolovedeeplearning.bsky.social for the picture!!! 🥰

December 6, 2024 at 9:54 PM

Reposted by Joshua Ong

Alessio Devoto

@alessiodevoto.bsky.social

Very cool work! 👏🚀 Unfortunately, errors in the original dataset will propagate to all new languages 😕

We investigated the issue of existing errors in the original MMLU in
arxiv.org/abs/2406.04127

@aryopg.bsky.social @neuralnoise.com

Sara Hooker @sarahooker.bsky.social · Dec 5

Is MMLU Western-centric? 🤔

As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages

📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...

December 6, 2024 at 1:57 PM

Reposted by Joshua Ong

Pasquale Minervini

@neuralnoise.com

For clarity -- great project, but most of the MMLU errors we found (and fixed) in our MMLU Redux paper (arxiv.org/abs/2406.04127) are also present in this dataset. We also provide a curated version of MMLU, so it's easy to fix 😊

Daniel Vila @dvilasuero.hf.co · Dec 6

Announcing Global-MMLU - an improved MMLU Open dataset with evaluation coverage across 42 languages.

The result of months of work with the goal of advancing Multilingual LLM evaluation.

Built together with the community and amazing collaborators at Cohere4AI, MILA, MIT, and many more.

December 6, 2024 at 9:26 AM

Reposted by Joshua Ong

Aryo Pradipta Gema

@aryopg.bsky.social

Super Cool work from Cohere for AI! 🎉 However, this highlights a concern raised by our MMLU-Redux team (arxiv.org/abs/2406.04127): **error propagation to many languages**. Issues in MMLU (e.g., "rapid intervention to solve ebola") seem to persist in many languages. Let's solve the root cause first?

Sara Hooker @sarahooker.bsky.social · Dec 5

Is MMLU Western-centric? 🤔

As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages

📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...

December 6, 2024 at 9:38 AM

Reposted by Joshua Ong

Pasquale Minervini

@neuralnoise.com

Sohee (@soheeyang.bsky.social) in the house! 🚀🚀🚀

December 5, 2024 at 2:38 PM

Reposted by Joshua Ong

Ai2

@ai2.bsky.social

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

November 26, 2024 at 8:51 PM

Reposted by Joshua Ong

Joe Stacey

@joestacey.bsky.social

This papers' findings about testing LLMs on NLI aligns with many of personal thoughts:

1) NLI remains a difficult task for LLMs
2) Having more few-shot examples is helpful (in my view, helping LLMs better understand class boundaries)
3) Incorrect predictions are often a result of ambiguous labels

November 24, 2024 at 4:38 PM

Reposted by Joshua Ong

Shaily

@shaily99.bsky.social

Since friends are doing NAACL / ICLR rebuttals, sharing my rebuttal template.
It works for me because it allows me to visually break down comments across reviewers into common themes, things that I can easily address v those that I can't, and also filter across these.

You all've got this!!!

rebuttal template

docs.google.com

November 23, 2024 at 4:07 PM

Joshua Ong

@jong21.bsky.social

Check out our CoMAT: Chain of Mathematically Annotated Thought, which improves mathematical reasoning by converting mathematical questions into structured symbolic representations and performing step-by-step reasoning🎉 works on various languages and challenging benchmarks

arxiv.org/pdf/2410.103...

November 20, 2024 at 3:29 PM

Reposted by Joshua Ong

Erik Arakelyan

@kirekara.bsky.social

The main question about the current LLM “reasoning” research is what to do next. Most go into synthetic generation and training on maybe with self-Refinement in hopes the model becomes better. I think we are missing controlled task formalization, step by step reasoning and strict step verification.

November 19, 2024 at 5:34 AM

Reposted by Joshua Ong

Akari Asai

@akariasai.bsky.social

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

November 19, 2024 at 4:30 PM

Reposted by Joshua Ong

Aryo Pradipta Gema

@aryopg.bsky.social

I’ll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! 🚀
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if you’re around! 👋

DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127

November 18, 2024 at 1:48 PM

Reposted by Joshua Ong

Emile van Krieken

@emilevankrieken.com

I made a starter pack with the people doing something related to Neurosymbolic AI that I could find.

Let me know if I missed you!
go.bsky.app/RMJ8q3i

November 11, 2024 at 3:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news