Joshua Ong
jong21.bsky.social
Joshua Ong
@jong21.bsky.social
BSc @ University of Edinburgh
Natural Language Processing, LLM Reasonings
Actively seeking for PhD position for 2025 spring/fall✨
Reposted by Joshua Ong
MMLU-Redux just touched down at #NAACL2025! 🎉
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅
If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋
May 2, 2025 at 1:00 PM
Reposted by Joshua Ong
Thanks @nolovedeeplearning.bsky.social for the picture!!! 🥰
December 6, 2024 at 9:54 PM
Reposted by Joshua Ong
Very cool work! 👏🚀 Unfortunately, errors in the original dataset will propagate to all new languages 😕

We investigated the issue of existing errors in the original MMLU in
arxiv.org/abs/2406.04127

@aryopg.bsky.social @neuralnoise.com
Is MMLU Western-centric? 🤔

As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages

📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...
December 6, 2024 at 1:57 PM
Reposted by Joshua Ong
For clarity -- great project, but most of the MMLU errors we found (and fixed) in our MMLU Redux paper (arxiv.org/abs/2406.04127) are also present in this dataset. We also provide a curated version of MMLU, so it's easy to fix 😊
Announcing Global-MMLU - an improved MMLU Open dataset with evaluation coverage across 42 languages.

The result of months of work with the goal of advancing Multilingual LLM evaluation.

Built together with the community and amazing collaborators at Cohere4AI, MILA, MIT, and many more.
December 6, 2024 at 9:26 AM
Reposted by Joshua Ong
Super Cool work from Cohere for AI! 🎉 However, this highlights a concern raised by our MMLU-Redux team (arxiv.org/abs/2406.04127): **error propagation to many languages**. Issues in MMLU (e.g., "rapid intervention to solve ebola") seem to persist in many languages. Let's solve the root cause first?
Is MMLU Western-centric? 🤔

As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages

📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...
December 6, 2024 at 9:38 AM
Reposted by Joshua Ong
Sohee (@soheeyang.bsky.social) in the house! 🚀🚀🚀
December 5, 2024 at 2:38 PM
Reposted by Joshua Ong
Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁
November 26, 2024 at 8:51 PM
Reposted by Joshua Ong
This papers' findings about testing LLMs on NLI aligns with many of personal thoughts:

1) NLI remains a difficult task for LLMs
2) Having more few-shot examples is helpful (in my view, helping LLMs better understand class boundaries)
3) Incorrect predictions are often a result of ambiguous labels
November 24, 2024 at 4:38 PM
Reposted by Joshua Ong
Since friends are doing NAACL / ICLR rebuttals, sharing my rebuttal template.
It works for me because it allows me to visually break down comments across reviewers into common themes, things that I can easily address v those that I can't, and also filter across these.

You all've got this!!!
rebuttal template
docs.google.com
November 23, 2024 at 4:07 PM
Check out our CoMAT: Chain of Mathematically Annotated Thought, which improves mathematical reasoning by converting mathematical questions into structured symbolic representations and performing step-by-step reasoning🎉 works on various languages and challenging benchmarks

arxiv.org/pdf/2410.103...
November 20, 2024 at 3:29 PM
Reposted by Joshua Ong
The main question about the current LLM “reasoning” research is what to do next. Most go into synthetic generation and training on maybe with self-Refinement in hopes the model becomes better. I think we are missing controlled task formalization, step by step reasoning and strict step verification.
November 19, 2024 at 5:34 AM
Reposted by Joshua Ong
1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai
November 19, 2024 at 4:30 PM
Reposted by Joshua Ong
I’ll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! 🚀
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if you’re around! 👋

DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127
November 18, 2024 at 1:48 PM
Reposted by Joshua Ong
I made a starter pack with the people doing something related to Neurosymbolic AI that I could find.

Let me know if I missed you!
go.bsky.app/RMJ8q3i
November 11, 2024 at 3:27 PM