Lightnews — Scholar-powered news

Daniel Paleka

@dpaleka.bsky.social

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations 🧵 (1/7)

June 5, 2025 at 5:08 PM

Daniel Paleka

@dpaleka.bsky.social

why is it that whenever i see survivorship bias on my timeline it already has the red-dotted plane in the replies?

May 26, 2025 at 3:07 PM

Daniel Paleka

@dpaleka.bsky.social

OpenAI and DeepMind should have entries at Eurovision too

May 17, 2025 at 2:16 PM

Daniel Paleka

@dpaleka.bsky.social

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear

4o: yes you are Jesus Christ's brother. now go. Nanjing awaits

o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

April 30, 2025 at 10:10 PM

Daniel Paleka

@dpaleka.bsky.social

Quick sycophancy eval: comparing the two recent OpenAI ChatGPT system prompts, it is clear last week's prompt moves other models towards sycophancy too, while the current prompt makes them more disagreeable.

April 30, 2025 at 3:16 PM

Daniel Paleka

@dpaleka.bsky.social

i was today years old when i realized the grammatical plural of anecdote is anecdotes, not anecdata. i dislike this finding

April 30, 2025 at 2:45 PM

Daniel Paleka

@dpaleka.bsky.social

we are so lucky that pathogens, as opposed to political and religious memes, do not organize coalitions of hosts against non-hosts as an instrumental objective

April 29, 2025 at 6:45 AM

Daniel Paleka

@dpaleka.bsky.social

are slot machines and the like so profitable because simplistic gambling is inherently very addictive, or because there has been a legible financial incentive for an entire industry to spend decades optimizing them to be addictive as possible?

March 31, 2025 at 11:50 AM

Daniel Paleka

@dpaleka.bsky.social

TIL the concept of *epistemic hell*. standard Joseph Henrich example: in the ancestral environment, hygienic and food prep rituals determine survival, but no hunter-gatherer can possibly explain why. hence genetic selection for accepting of religious rituals and against reasoning

March 23, 2025 at 2:23 PM

Daniel Paleka

@dpaleka.bsky.social

Why do meeting transcription apps (Fireflies, Granola) require Google Workspace accounts?

March 13, 2025 at 9:43 PM

Daniel Paleka

@dpaleka.bsky.social

what are you doing Claude i thought we were friends

January 17, 2025 at 7:12 AM

Daniel Paleka

@dpaleka.bsky.social

the rate of people's familiarity with Scaling Scaling Laws with Board Games over time is starting to look like the plot from Scaling Scaling Laws with Board Games

January 16, 2025 at 9:40 PM

Daniel Paleka

@dpaleka.bsky.social

go do something that can fail

January 12, 2025 at 8:34 PM

Daniel Paleka

@dpaleka.bsky.social

Recent LLM forecasters are getting better at predicting the future. But there's a challenge: How can we evaluate and compare AI forecasters without waiting years to see which predictions were right? (1/11)

January 11, 2025 at 1:53 AM

Daniel Paleka

@dpaleka.bsky.social

i saw the bridge from Golden Gate Claude yesterday

January 9, 2025 at 4:17 AM

Daniel Paleka

@dpaleka.bsky.social

LLMs rapidly improving at software engineering and math, given that the rate of improvement in ideation is slower, means you should be intentional about what value is gained from doing a highly technical project now as opposed to later

January 8, 2025 at 12:54 AM

Daniel Paleka

@dpaleka.bsky.social

by interacting with LLMs you learn to offload thinking to them in ways useful to you, which is the second most important skill for the takeoff

every time you talk to an LLM you lose decorrelation with LLM cognition, which is *the* most important skill for the takeoff

January 4, 2025 at 6:01 PM

Daniel Paleka

@dpaleka.bsky.social

my New Year's resolution: don't work on a bigger project if there is not a clear reason for doing it *now*.

disregarding the AGI timelines, the R&D acceleration is a clear reason against technical work where the discount rates on the final product are low

December 31, 2024 at 10:52 PM

Daniel Paleka

@dpaleka.bsky.social

environments are a psyop

a model can verify a proof or unroll a chess game. it can even eyeball if the code works

the superintelligence loop will just be asking an AI agent to give feedback on its output by any means it can

if the task needs a simulator the AI will write one

December 31, 2024 at 6:45 PM

Daniel Paleka

@dpaleka.bsky.social

To those who believe Anthropic HHH incorrigibility paper implies sth for tamper resistance: I am willing to bet against. Just specify what exactly can't be done with the first open-weight model over some capability and jailbreak resistance threshold, given some compute budget.

December 20, 2024 at 2:01 PM

Daniel Paleka

@dpaleka.bsky.social

NeurIPS test of time award talk on GANs mentions the paper was done in 12 days, from idea to submission. Two days more than Javascript, but slightly faster than the first versions of Git or Unix.

December 13, 2024 at 10:07 PM

Daniel Paleka

@dpaleka.bsky.social

I'm at NeurIPS, do reach out if you want to grab a coffee!

December 11, 2024 at 3:30 AM

Daniel Paleka

@dpaleka.bsky.social

they are doing gain of function research on Whova attendees order hacks now

December 10, 2024 at 7:48 PM

Daniel Paleka

@dpaleka.bsky.social

TIL that the atmosphere blocks basically all electromagnetic radiation, except three small windows: one for visible light, one for cooling the Earth, and one for radio waves. Earth is the USA of planets.

November 28, 2024 at 7:03 PM

Daniel Paleka

@dpaleka.bsky.social

guys literally only want one thing and it's the patient work of sitting down every day and reading papers until their eyes bleed, and hoping that something good comes out of it someday

November 27, 2024 at 8:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news