Lightnews — Scholar-powered news

Daniel Paleka

@dpaleka.bsky.social

Benchmarks can reward strategic gambling over calibrated forecasting when optimizing for ranking performance.

"Bet everything" on one scenario beats careful probability estimation for maximizing the chance of ranking #1 on the leaderboard. (6/7)

June 5, 2025 at 5:08 PM

Daniel Paleka

@dpaleka.bsky.social

Model knowledge cutoffs are guidelines about reliability, not guarantees of no information thereafter. GPT-4o, when nudged, can reveal knowledge beyond its stated Oct 2023 cutoff. (5/7)

June 5, 2025 at 5:08 PM

Daniel Paleka

@dpaleka.bsky.social

Date-restricted search leaks future knowledge. Searching pre-2019 articles about “Wuhan” returns results abnormally biased towards the Wuhan Institute of Virology — an association that only emerged later. (4/7)

June 5, 2025 at 5:08 PM

Daniel Paleka

@dpaleka.bsky.social

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations 🧵 (1/7)

June 5, 2025 at 5:08 PM

Daniel Paleka

@dpaleka.bsky.social

why is it that whenever i see survivorship bias on my timeline it already has the red-dotted plane in the replies?

May 26, 2025 at 3:07 PM

Daniel Paleka

@dpaleka.bsky.social

Quick sycophancy eval: comparing the two recent OpenAI ChatGPT system prompts, it is clear last week's prompt moves other models towards sycophancy too, while the current prompt makes them more disagreeable.

April 30, 2025 at 3:16 PM

Daniel Paleka

@dpaleka.bsky.social

what are you doing Claude i thought we were friends

January 17, 2025 at 7:12 AM

Daniel Paleka

@dpaleka.bsky.social

Test-time compute based on arbitrage can make forecasts more consistent; this improves specific logical rules such as Negation, but doesn't generalize to the consistency rules we do not optimize over. (9/11)

January 11, 2025 at 1:53 AM

Daniel Paleka

@dpaleka.bsky.social

Some consistency checks are better signals than others; for instance, the violation of P(A)P(B|A) = P(A&B) explains a high fraction of the variation in forecasting performance over a range of forecasters. (7/11)

January 11, 2025 at 1:53 AM

Daniel Paleka

@dpaleka.bsky.social

Starting from a base question, we generate multiple logically related questions and ask them independently. (5/11)

January 11, 2025 at 1:53 AM

Daniel Paleka

@dpaleka.bsky.social

We create consistency checks from base forecasting questions, which we take from various sources (prediction market questions, synthetically generated from news, purely LLM-generated); ask the forecasters for probabilities, and check how consistent the predictions are (4/11)

January 11, 2025 at 1:53 AM

Daniel Paleka

@dpaleka.bsky.social

We test 10 different logical rules that a consistent forecaster should satisfy. (3/11)

January 11, 2025 at 1:53 AM

Daniel Paleka

@dpaleka.bsky.social

Recent LLM forecasters are getting better at predicting the future. But there's a challenge: How can we evaluate and compare AI forecasters without waiting years to see which predictions were right? (1/11)

January 11, 2025 at 1:53 AM

Daniel Paleka

@dpaleka.bsky.social

LLMs rapidly improving at software engineering and math, given that the rate of improvement in ideation is slower, means you should be intentional about what value is gained from doing a highly technical project now as opposed to later

January 8, 2025 at 12:54 AM

Daniel Paleka

@dpaleka.bsky.social

my New Year's resolution: don't work on a bigger project if there is not a clear reason for doing it *now*.

disregarding the AGI timelines, the R&D acceleration is a clear reason against technical work where the discount rates on the final product are low

December 31, 2024 at 10:52 PM

Daniel Paleka

@dpaleka.bsky.social

TIL that the atmosphere blocks basically all electromagnetic radiation, except three small windows: one for visible light, one for cooling the Earth, and one for radio waves. Earth is the USA of planets.

November 28, 2024 at 7:03 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news