arxiv.org/abs/2505.23947 n/n
arxiv.org/abs/2505.23947 n/n
news.ycombinator.com/item?id=4264...
news.ycombinator.com/item?id=4264...
The PFN idea is to use a prior, e.g. a bayesian neural network (BNN) prior, sample datasets from that prior, and then train to predict the hold-out labels of these datasets. (no training on real-world data) 2/n
The PFN idea is to use a prior, e.g. a bayesian neural network (BNN) prior, sample datasets from that prior, and then train to predict the hold-out labels of these datasets. (no training on real-world data) 2/n
If you think, my single person experiment is not to be trusted? You are right, try it yourself!
If you think, my single person experiment is not to be trusted? You are right, try it yourself!
That is I could identify all 3 models correctly in 13/20 cases after practicing with 20 questions.
That means attributing responses to LLMs is super easy for humans.
That is I could identify all 3 models correctly in 13/20 cases after practicing with 20 questions.
That means attributing responses to LLMs is super easy for humans.
I used 20 interactions in the easy mode to learn the models' behaviors.
In hard mode (see prev post), you need to match three responses to the LLM name.
I used 20 interactions in the easy mode to learn the models' behaviors.
In hard mode (see prev post), you need to match three responses to the LLM name.
To figure out if this is the case, I created a game with two modes.
The game is about identifying which answer was provided by which LLM.
To figure out if this is the case, I created a game with two modes.
The game is about identifying which answer was provided by which LLM.
E.g. Grok 3 only has 10K votes and there are 2.7M votes in total on lmarena.
If half of e.g. OpenAI (2,000 employees) voted just once a day, they would make up > 10% of all 2.7M lmarena votes over its one-year existence.
E.g. Grok 3 only has 10K votes and there are 2.7M votes in total on lmarena.
If half of e.g. OpenAI (2,000 employees) voted just once a day, they would make up > 10% of all 2.7M lmarena votes over its one-year existence.