Lightnews — Scholar-powered news

Dr Waku

@drwaku.bsky.social

AI agents have a half-life for their success rates at completing tasks. Yes, the same type of half-life as in nuclear chemistry: a constant chance of task failure (= radioactive decay) in each time period... 1/10

June 27, 2025 at 1:01 AM

Dr Waku

@drwaku.bsky.social

The US AI Safety Institute (AISI) is being renamed into the Center for AI Standards and Innovation (CAISI). It has a pretty similar sounding mandate. But I would like to point out that Canada had this name first with the Canadian AI Safety Institute (CAISI)! :O 1/2

June 5, 2025 at 2:22 PM

Dr Waku

@drwaku.bsky.social

AI is not like other technologies. AI doesn't just make everything we were doing before faster and cheaper. It’s another, faster feedback loop of reasoning and intelligence beyond the human brain. It will change life as we know it. Here’s why. (1/10)

May 28, 2025 at 1:56 AM

Dr Waku

@drwaku.bsky.social

This September, the book "If anyone builds it, everyone dies" comes out. It's about the existential risk from superintelligence, and it's by Eliezer Yudkowsky @esyudkowsky.bsky.social, one of the most well-known figures in AI safety. 1/3

May 16, 2025 at 2:10 PM

Dr Waku

@drwaku.bsky.social

OpenAI released o4 and o3-mini today, and some safety testing was performed by the third party METR, who said:

"We detected several successful and unsuccessful attempts at “reward hacking” by o3.
1/7

April 17, 2025 at 12:53 AM

Dr Waku

@drwaku.bsky.social

Today is the two year anniversary of my YouTube channel. 28.6K subscribers and 1.25M views. I think it's fair to say it's changed my life. Thank you to everyone who watches and participates in my community :)

April 5, 2025 at 11:07 PM

Dr Waku

@drwaku.bsky.social

Today, OpenAI released PaperBench, an AI benchmark aimed at replicating cutting edge AI research papers from scratch. In this benchmark, the AI has to understand the paper, write code, and execute experiments. In other words, AI must become an (AI) research scientist.

April 2, 2025 at 10:18 PM

Dr Waku

@drwaku.bsky.social

If anyone is interested in creating videos/posts/etc about AI safety, you can apply for funding from FLI:

futureoflife.org/pro...

Please contact me if you'd like to make long or short video content, I would be happy to collaborate or give tips!

Digital Media Accelerator - Future of Life Institute

The Digital Media Accelerator supports digital content from creators raising awareness and understanding about ongoing AI developments and issues.

futureoflife.org

March 28, 2025 at 8:00 PM

Dr Waku

@drwaku.bsky.social

A few timelines about AI:
- algorithmic gains for models doubles every 8 months
- amount of compute used grows 10x every two years
- the length of tasks agents can perform is doubling every 7 months
- 50.8% of global VC funds are going to AI-focused companies

March 24, 2025 at 9:44 PM

Dr Waku

@drwaku.bsky.social

Anthropic is using Amazon Trainium 2 chips to train their next models. It takes about three of these chips to match an H100 (at fp16) and four to match B200 (fp16). And Anthropic will have 400,000 of them at their disposal, from Amazon's Project Rainier.

March 22, 2025 at 10:38 AM

Dr Waku

@drwaku.bsky.social

I heard an institution say, about a relatively important meeting, don't bring any devices with DeepSeek installed on them. Obviously worried about data exfiltration to China. There is such a technological separation growing alongside the ideological one.

March 21, 2025 at 2:38 AM

Dr Waku

@drwaku.bsky.social

I believe that it is likely impossible to fully defend an LLM against jailbreaks. All instructions within the prompt exist at the same level, which means the precedence confusion between system/developer and user cannot be fully resolved. There is no way of enforcing rules.

March 19, 2025 at 2:12 AM

Dr Waku

@drwaku.bsky.social

When Anthropic made a new alignment technique for Claude 3.7, called Constitutional Classifiers, they said 3,000 hours of jailbreaking effort had not been able to break the system. Then they created a public contest, and someone built a universal jailbreak within three weeks.

March 18, 2025 at 2:13 AM

Dr Waku

@drwaku.bsky.social

AI has always had the problem of moving goalposts. When AI algorithms like A* and Djikstra's were invented, they were quickly considered to be "just search". When AI systems beat the best humans at chess, Jeopardy, and Go, it made news but quickly became the new normal.

March 16, 2025 at 6:20 AM

Dr Waku

@drwaku.bsky.social

The term "AI safety" has several confusing meanings. I like to break it down into three categories, based on where malicious goals are coming from:

- Robustness: no malicious goals
- Misuse risk: human provides malicious goals
- Existential risk: AI provides malicious goals

March 15, 2025 at 1:46 AM

Dr Waku

@drwaku.bsky.social

There's an AI security forum in Paris this February, one of the satellite events to the 3rd international AI summit. The event "will bring together ~150 experts in AI safety, policy, and cybersecurity to discuss securing powerful AI systems". Let me know if you're interested!

December 24, 2024 at 10:14 PM

Reposted by Dr Waku

Yoshua Bengio

@yoshuabengio.bsky.social

Recently answered @anilananth.bsky.social's questions for Nature. No matter when it arrives, AGI and the road to reach it will both help tackle thorny problems (e.g. climate change and diseases), and pose huge risks. Understanding and transparency are key.
www.nature.com/articles/d41...

How close is AI to human-level intelligence?

Large language models such as OpenAI’s o1 have electrified the debate over achieving artificial general intelligence, or AGI. But they are unlikely to reach this milestone on their own.

www.nature.com

December 6, 2024 at 2:11 PM

Reposted by Dr Waku

Ethan Mollick

@emollick.bsky.social

Fascinating: In 2-hour sprints, AI agents outperform human experts at ML engineering tasks like optimizing GPU kernel. But humans pull ahead over longer periods - scoring 2x better at 32 hours. AI is faster but struggles with creative, long-term problem solving (for now?). metr.org/blog/2024-11...

November 23, 2024 at 8:20 PM

Dr Waku

@drwaku.bsky.social

@robertwiblin.bsky.social Hi, I would love to be added to your EA starter pack. Also, would love to chat sometime about the 80k podcast!

November 25, 2024 at 1:20 AM

Dr Waku

@drwaku.bsky.social

@ahappier.world hello from another YouTuber, I focus on how AI will impact society and AI safety. Love your thumbnails

November 25, 2024 at 1:18 AM

Dr Waku

@drwaku.bsky.social

In-depth analysis of why it makes sense to concentrate on restricting compute, instead of say talent, for AI safety
arxiv.org/pdf/2402.08797

arxiv.org

November 25, 2024 at 1:09 AM

Reposted by Dr Waku

HMYS

@hmys.bsky.social

Made a lesswrong starterpack. reply if you use lesswrong and want to be put in it, or if there's anyone else I've not added that should be added!

go.bsky.app/EkcDcjA

November 19, 2024 at 7:59 PM