Lightnews — Scholar-powered news

Ivan Leo

@ivanleomk.bsky.social

80 followers 53 following 11 posts

Applied AI stuff from time to time, I write at ivanleo.com

Posts Replies Media Videos

Ivan Leo

@ivanleomk.bsky.social

Hmm for benchmarks it depends on what I’m benchmarking. If it’s a metric I honestly just use brain trust at that point to log everything.

Asserts are super easy to just get started with and so I use them for simple eyeballing and checks ( including some pytests tests )

November 27, 2024 at 5:12 AM

Ivan Leo

@ivanleomk.bsky.social

Interesting will give it a look, thanks so much!

November 26, 2024 at 3:48 PM

Ivan Leo

@ivanleomk.bsky.social

This was a really good read! I’m curious though when it comes to building benchmarks, what are your favourite ways for

- Deduplication
- Quality Control

I’ve tried to build spatial reasoning benchmarks before but never released them

November 26, 2024 at 4:20 AM

Ivan Leo

@ivanleomk.bsky.social

Hello sir!

November 26, 2024 at 4:18 AM

Reposted by Ivan Leo

Eugene Yan

@eugeneyan.com

Evals are "too damn expensive" until you:

• can't migrate underlying models safely
• can't add new features with confidence
• can't ship without HITL evals, which takes >100x longer
• product development and iteration grinds to a halt
• lose customer trust due to poor user experience

November 23, 2024 at 4:57 AM

Ivan Leo

@ivanleomk.bsky.social

Yeah haha, this was my problem too.

I often find that some upfront investment in building these tools pays off significantly

November 22, 2024 at 1:41 AM

Ivan Leo

@ivanleomk.bsky.social

Trying to find more AI folks too for a better feed, hope it improves lol

November 22, 2024 at 1:39 AM

Ivan Leo

@ivanleomk.bsky.social

Think you mean not sshing in haha

November 20, 2024 at 3:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news