Lightnews — Scholar-powered news

K N Anantha nandanan

@ananthan2k.bsky.social

19 followers 37 following 7 posts

Engineer, Django/Python Developer, AI & LLM Enthusiast

https://ananthanandanan.vercel.app/

Posts Replies Media Videos

K N Anantha nandanan

@ananthan2k.bsky.social

My team has been using gpt-4o as LLM judge for a specific usecase; it just bad on evaluation set after iteration on prompts. But switched to claude 3.5(latest). Oh boy. From CoT and the performance it just gets it. Performance is consistent at 95-98% on eval set.

November 26, 2024 at 8:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news