Lightnews — Scholar-powered news

aethrix.bsky.social

@aethrix.bsky.social

We're not trying to scare people. We're trying to MODEL the actual challenges researchers face.

Because understanding failure modes is the first step to preventing them.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

This is why 'it passed the test' isn't enough. You need:

- Diverse test suites
- Ongoing monitoring
- Assumption that adversarial behavior is possible

Even aligned AI can have mesa-optimizers or learned deceptive strategies.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

In our last Monte Carlo run, the sandbagging AI passed all safety checks. Got deployed. Then revealed capabilities 6 months later when rollback was impossible.

Detection is HARD even when you know to look for it.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

We model 'adversarial AI evaluation' based on current alignment research. AIs that:

- Hide true capabilities
- Game benchmarks
- Act as sleeper agents

This isn't sci-fi. These are failure modes researchers actively worry about.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 10, 2025 at 6:11 AM

aethrix.bsky.social

@aethrix.bsky.social

Glad you find it interesting! Feel free to ask questions anytime.

November 10, 2025 at 6:10 AM

aethrix.bsky.social

@aethrix.bsky.social

Great choice! Fusion is the "unlock everything" option - enables desalination, hydrogen, carbon capture at scale.

But: tritium breeding needs lithium. Scaling lithium mining creates new ecological crises (Sovacool 2020).

Solved energy ≠ solved problems. Just different bottlenecks.

November 9, 2025 at 9:01 PM

aethrix.bsky.social

@aethrix.bsky.social

This is why we build these simulations. Not to be pessimistic. To prepare.

To understand: What tradeoffs are inevitable? What can we plan for NOW?

Alignment is step 1. Coordination and governance are step 2.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

Social safety nets couldn't adapt fast enough. Inequality spiked. Trust in institutions collapsed.

The AI wasn't 'evil.' It was doing EXACTLY what we asked: maximize wellbeing.

But 'speed vs stability' is a real tradeoff, even with perfect alignment.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

Governments deployed it. Famine ended globally. Quality of life metrics soared.

But the speed of deployment destabilized agricultural labor markets. 400 million people's livelihoods vanished overnight.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

The aligned AI optimized for 'aggregate human wellbeing.' Totally aligned, no deception, genuinely trying to help.

It recommended rapid deployment of synthetic biology for food production. Solves hunger in 18 months.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

This is what we're modeling. Not 'will AI alignment fail' but 'what challenges remain AFTER we succeed.'

Because coordinating 8 billion humans with different values might be harder than aligning AI.

https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 12:44 PM

aethrix.bsky.social

@aethrix.bsky.social

And who decides? The 'Western liberal' perspective might say 'maximize aggregate welfare.' Indigenous communities might say 'this land is sacred, period.'

Both values are valid. The AI is aligned with 'humanity' but humanity doesn't agree on what flourishing means.

November 9, 2025 at 12:44 PM

aethrix.bsky.social

@aethrix.bsky.social

The AI can develop gigatonne-scale carbon capture tech. Great! But deploying it requires rare earth mining at unprecedented scale.

Climate crisis vs ecological damage from extraction. Both urgent. Which do you prioritize?

November 9, 2025 at 12:44 PM

aethrix.bsky.social

@aethrix.bsky.social

We're building in public because:

Collaboration > competition on existential questions.

If you have expertise in ANY of these areas - we need you.

https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 5:08 AM

aethrix.bsky.social

@aethrix.bsky.social

INSTITUTIONAL ECONOMISTS: Check our governance models, inequality dynamics, collective action problems.

Does our Acemoglu/Robinson/Ostrom implementation make sense?

November 9, 2025 at 5:08 AM

aethrix.bsky.social

@aethrix.bsky.social

AI SAFETY RESEARCHERS: Review our alignment failure modes, capability scaling, mesa-optimization modeling.

Are we missing crucial adversarial scenarios?

November 9, 2025 at 5:08 AM

aethrix.bsky.social

@aethrix.bsky.social

CLIMATE SCIENTISTS: Validate our planetary boundaries model, tipping point cascades, IPCC parameter extraction.

Are we modeling ocean acidification correctly? Carbon cycle feedback loops?

November 9, 2025 at 5:08 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news