Lightnews — Scholar-powered news

Adam Bataineh MD

@dradamb.bsky.social

11 followers 2 following 3 posts

founder of numenor.health (previously cofounder of Span - acquired by Eight Sleep)

Posts Replies Media Videos

Adam Bataineh MD

@dradamb.bsky.social

A team at Apple recently published a really interesting paper where they tested LLM performance with GSM (a standard benchmark test for mathematical reasoning ability)

they modified the questions with unnecessary information to distract the LLMs

It led to much lower accuracy even for o1

December 23, 2024 at 3:07 PM

Adam Bataineh MD

@dradamb.bsky.social

I wonder how much of the improvement in performance is because of goodhart's law

“When a measure becomes a target, it ceases to be a good measure”

I.e. is better performance on benchmark tests translatable to real world performance?

December 23, 2024 at 3:07 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news