K N Anantha nandanan
banner
ananthan2k.bsky.social
K N Anantha nandanan
@ananthan2k.bsky.social
Engineer, Django/Python Developer, AI & LLM Enthusiast

https://ananthanandanan.vercel.app/
My team has been using gpt-4o as LLM judge for a specific usecase; it just bad on evaluation set after iteration on prompts. But switched to claude 3.5(latest). Oh boy. From CoT and the performance it just gets it. Performance is consistent at 95-98% on eval set.
November 26, 2024 at 8:27 PM