Lightnews — Scholar-powered news

Nikolay Bogoychev

@xapajiamnu.bsky.social

47 followers 51 following 13 posts

Research Scientist at Meta.
LLMs, neural networks, logographic writing systems.

https://nbogoychev.com

Posts Replies Media Videos

Nikolay Bogoychev

@xapajiamnu.bsky.social

[2/2]
This is part of Kaggle benchmarking initiative, spearingheading the effort for trustworthy and rigorous LLM evaluation!
www.kaggle.com/blog/announc...

Introducing Kaggle Benchmarks | Kaggle

Democratizing trustworthy, rigorous evaluation for the GenAI era

www.kaggle.com

July 23, 2025 at 1:04 PM

Nikolay Bogoychev

@xapajiamnu.bsky.social

[5/5] Paper: arxiv.org/abs/2504.10356
Dataset, code, leaderboard: github.com/facebookrese...

April 15, 2025 at 1:03 PM

Nikolay Bogoychev

@xapajiamnu.bsky.social

[4/5] We release a dev partition which you can download and use. We also have a secret test partition, which will not be released at this point in time to avoid accidental contamination. If you upload your models on huggingface, we can evaluate them for you and put you on our leaderboard!

April 15, 2025 at 1:02 PM

Nikolay Bogoychev

@xapajiamnu.bsky.social

[3/5] Our dataset shows that in general, LLMs answer questions better if they are asked in the language corresponding to the culture where the knowledge originated, showing that there is a lot of growth potential when it comes to cross lingual knowledge transfer.

April 15, 2025 at 1:01 PM

Nikolay Bogoychev

@xapajiamnu.bsky.social

[2/5] Bulgarians are much more likely to ask an LLM about Tsar Simeon, rather than queen Elizabeth.

We also provide translations to and from English of our dataset so that we can measure crosslingual knowledge transfer.

April 15, 2025 at 1:00 PM

Nikolay Bogoychev

@xapajiamnu.bsky.social

Absolutely. I am just looking for a way to make scaling laws map to actual task metrics. At the moment we have the situation of "look at how nice the nll goes down on task X when we scale the model", but no idea what the actual downstream task metric will be or when it'd saturate. Maybe impossible

February 27, 2025 at 12:23 AM

Nikolay Bogoychev

@xapajiamnu.bsky.social

We should get rid of those auxiliary tasks it shoehorns everybody into nll for scaling laws...

February 26, 2025 at 6:06 PM

Nikolay Bogoychev

@xapajiamnu.bsky.social

Pretty cool work! Did you also see neurips.cc/virtual/2023... ?

It would be nice if we can also make a figure out more linear metrics for evaluating popular LLM tasks...

NeurIPS Poster Are Emergent Abilities of Large Language Models a Mirage?NeurIPS 2023

neurips.cc

February 26, 2025 at 7:39 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news