Daniel Paleka
dpaleka.bsky.social
Daniel Paleka
@dpaleka.bsky.social
ai safety researcher | phd ETH Zurich | https://danielpaleka.com
How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations 🧵 (1/7)
June 5, 2025 at 5:08 PM
why is it that whenever i see survivorship bias on my timeline it already has the red-dotted plane in the replies?
May 26, 2025 at 3:07 PM
OpenAI and DeepMind should have entries at Eurovision too
May 17, 2025 at 2:16 PM
3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear

4o: yes you are Jesus Christ's brother. now go. Nanjing awaits

o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream
April 30, 2025 at 10:10 PM
Quick sycophancy eval: comparing the two recent OpenAI ChatGPT system prompts, it is clear last week's prompt moves other models towards sycophancy too, while the current prompt makes them more disagreeable.
April 30, 2025 at 3:16 PM
i was today years old when i realized the grammatical plural of anecdote is anecdotes, not anecdata. i dislike this finding
April 30, 2025 at 2:45 PM
we are so lucky that pathogens, as opposed to political and religious memes, do not organize coalitions of hosts against non-hosts as an instrumental objective
April 29, 2025 at 6:45 AM
are slot machines and the like so profitable because simplistic gambling is inherently very addictive, or because there has been a legible financial incentive for an entire industry to spend decades optimizing them to be addictive as possible?
March 31, 2025 at 11:50 AM
TIL the concept of *epistemic hell*. standard Joseph Henrich example: in the ancestral environment, hygienic and food prep rituals determine survival, but no hunter-gatherer can possibly explain why. hence genetic selection for accepting of religious rituals and against reasoning
March 23, 2025 at 2:23 PM
Why do meeting transcription apps (Fireflies, Granola) require Google Workspace accounts?
March 13, 2025 at 9:43 PM
what are you doing Claude i thought we were friends
January 17, 2025 at 7:12 AM
the rate of people's familiarity with Scaling Scaling Laws with Board Games over time is starting to look like the plot from Scaling Scaling Laws with Board Games
January 16, 2025 at 9:40 PM
go do something that can fail
January 12, 2025 at 8:34 PM
Recent LLM forecasters are getting better at predicting the future. But there's a challenge: How can we evaluate and compare AI forecasters without waiting years to see which predictions were right? (1/11)
January 11, 2025 at 1:53 AM
i saw the bridge from Golden Gate Claude yesterday
January 9, 2025 at 4:17 AM
LLMs rapidly improving at software engineering and math, given that the rate of improvement in ideation is slower, means you should be intentional about what value is gained from doing a highly technical project now as opposed to later
January 8, 2025 at 12:54 AM
by interacting with LLMs you learn to offload thinking to them in ways useful to you, which is the second most important skill for the takeoff

every time you talk to an LLM you lose decorrelation with LLM cognition, which is *the* most important skill for the takeoff
January 4, 2025 at 6:01 PM
my New Year's resolution: don't work on a bigger project if there is not a clear reason for doing it *now*.

disregarding the AGI timelines, the R&D acceleration is a clear reason against technical work where the discount rates on the final product are low
December 31, 2024 at 10:52 PM
environments are a psyop

a model can verify a proof or unroll a chess game. it can even eyeball if the code works

the superintelligence loop will just be asking an AI agent to give feedback on its output by any means it can

if the task needs a simulator the AI will write one
December 31, 2024 at 6:45 PM
To those who believe Anthropic HHH incorrigibility paper implies sth for tamper resistance: I am willing to bet against. Just specify what exactly can't be done with the first open-weight model over some capability and jailbreak resistance threshold, given some compute budget.
December 20, 2024 at 2:01 PM
NeurIPS test of time award talk on GANs mentions the paper was done in 12 days, from idea to submission. Two days more than Javascript, but slightly faster than the first versions of Git or Unix.
December 13, 2024 at 10:07 PM
I'm at NeurIPS, do reach out if you want to grab a coffee!
December 11, 2024 at 3:30 AM
they are doing gain of function research on Whova attendees order hacks now
December 10, 2024 at 7:48 PM
TIL that the atmosphere blocks basically all electromagnetic radiation, except three small windows: one for visible light, one for cooling the Earth, and one for radio waves. Earth is the USA of planets.
November 28, 2024 at 7:03 PM
guys literally only want one thing and it's the patient work of sitting down every day and reading papers until their eyes bleed, and hoping that something good comes out of it someday
November 27, 2024 at 8:34 AM