Lightnews — Scholar-powered news

Ayush Thakur

@ayushthakur.bsky.social

38 followers 3 following 4 posts

MLE @ Weights and Biases

Posts Replies Media Videos

Ayush Thakur

@ayushthakur.bsky.social

Launched a course on evaluating LLM based applications: wandb.ai/site/courses...

Enjoy. 😄

LLM Apps: Evaluation

Develop techniques for building, optimizing, and scaling AI evaluators with minimal human input. Learn to build reliable evaluation pipelines for LLM applications by combining programmatic checks with...

wandb.ai

January 13, 2025 at 6:17 PM

Ayush Thakur

@ayushthakur.bsky.social

Back in the days, WMT14 en-de dataset with 400k training samples was used a lot for NMT tasks. The reason for that is German is morphologically richer than other subsets in that benchmark.

November 25, 2024 at 10:55 AM

Ayush Thakur

@ayushthakur.bsky.social

Have been working on a "LLM system" robustness metric "scorer".

Turns out your statistical metrics like Cohen's d and Cohen's h are really good to quantify robustness.

Cohen's h is especially good when system's output is binary.

November 25, 2024 at 10:51 AM

Ayush Thakur

@ayushthakur.bsky.social

November 20, 2024 at 6:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news