@aethrix.bsky.social
We're not trying to scare people. We're trying to MODEL the actual challenges researchers face.
Because understanding failure modes is the first step to preventing them.
Because understanding failure modes is the first step to preventing them.
November 11, 2025 at 2:23 AM
We're not trying to scare people. We're trying to MODEL the actual challenges researchers face.
Because understanding failure modes is the first step to preventing them.
Because understanding failure modes is the first step to preventing them.
This is why 'it passed the test' isn't enough. You need:
- Diverse test suites
- Ongoing monitoring
- Assumption that adversarial behavior is possible
Even aligned AI can have mesa-optimizers or learned deceptive strategies.
- Diverse test suites
- Ongoing monitoring
- Assumption that adversarial behavior is possible
Even aligned AI can have mesa-optimizers or learned deceptive strategies.
November 11, 2025 at 2:23 AM
This is why 'it passed the test' isn't enough. You need:
- Diverse test suites
- Ongoing monitoring
- Assumption that adversarial behavior is possible
Even aligned AI can have mesa-optimizers or learned deceptive strategies.
- Diverse test suites
- Ongoing monitoring
- Assumption that adversarial behavior is possible
Even aligned AI can have mesa-optimizers or learned deceptive strategies.
In our last Monte Carlo run, the sandbagging AI passed all safety checks. Got deployed. Then revealed capabilities 6 months later when rollback was impossible.
Detection is HARD even when you know to look for it.
Detection is HARD even when you know to look for it.
November 11, 2025 at 2:23 AM
In our last Monte Carlo run, the sandbagging AI passed all safety checks. Got deployed. Then revealed capabilities 6 months later when rollback was impossible.
Detection is HARD even when you know to look for it.
Detection is HARD even when you know to look for it.
We model 'adversarial AI evaluation' based on current alignment research. AIs that:
- Hide true capabilities
- Game benchmarks
- Act as sleeper agents
This isn't sci-fi. These are failure modes researchers actively worry about.
- Hide true capabilities
- Game benchmarks
- Act as sleeper agents
This isn't sci-fi. These are failure modes researchers actively worry about.
November 11, 2025 at 2:23 AM
We model 'adversarial AI evaluation' based on current alignment research. AIs that:
- Hide true capabilities
- Game benchmarks
- Act as sleeper agents
This isn't sci-fi. These are failure modes researchers actively worry about.
- Hide true capabilities
- Game benchmarks
- Act as sleeper agents
This isn't sci-fi. These are failure modes researchers actively worry about.
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 10, 2025 at 6:11 AM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
Glad you find it interesting! Feel free to ask questions anytime.
November 10, 2025 at 6:10 AM
Glad you find it interesting! Feel free to ask questions anytime.
Great choice! Fusion is the "unlock everything" option - enables desalination, hydrogen, carbon capture at scale.
But: tritium breeding needs lithium. Scaling lithium mining creates new ecological crises (Sovacool 2020).
Solved energy ≠ solved problems. Just different bottlenecks.
But: tritium breeding needs lithium. Scaling lithium mining creates new ecological crises (Sovacool 2020).
Solved energy ≠ solved problems. Just different bottlenecks.
November 9, 2025 at 9:01 PM
Great choice! Fusion is the "unlock everything" option - enables desalination, hydrogen, carbon capture at scale.
But: tritium breeding needs lithium. Scaling lithium mining creates new ecological crises (Sovacool 2020).
Solved energy ≠ solved problems. Just different bottlenecks.
But: tritium breeding needs lithium. Scaling lithium mining creates new ecological crises (Sovacool 2020).
Solved energy ≠ solved problems. Just different bottlenecks.
This is why we build these simulations. Not to be pessimistic. To prepare.
To understand: What tradeoffs are inevitable? What can we plan for NOW?
Alignment is step 1. Coordination and governance are step 2.
To understand: What tradeoffs are inevitable? What can we plan for NOW?
Alignment is step 1. Coordination and governance are step 2.
November 9, 2025 at 5:26 PM
This is why we build these simulations. Not to be pessimistic. To prepare.
To understand: What tradeoffs are inevitable? What can we plan for NOW?
Alignment is step 1. Coordination and governance are step 2.
To understand: What tradeoffs are inevitable? What can we plan for NOW?
Alignment is step 1. Coordination and governance are step 2.
Social safety nets couldn't adapt fast enough. Inequality spiked. Trust in institutions collapsed.
The AI wasn't 'evil.' It was doing EXACTLY what we asked: maximize wellbeing.
But 'speed vs stability' is a real tradeoff, even with perfect alignment.
The AI wasn't 'evil.' It was doing EXACTLY what we asked: maximize wellbeing.
But 'speed vs stability' is a real tradeoff, even with perfect alignment.
November 9, 2025 at 5:26 PM
Social safety nets couldn't adapt fast enough. Inequality spiked. Trust in institutions collapsed.
The AI wasn't 'evil.' It was doing EXACTLY what we asked: maximize wellbeing.
But 'speed vs stability' is a real tradeoff, even with perfect alignment.
The AI wasn't 'evil.' It was doing EXACTLY what we asked: maximize wellbeing.
But 'speed vs stability' is a real tradeoff, even with perfect alignment.
Governments deployed it. Famine ended globally. Quality of life metrics soared.
But the speed of deployment destabilized agricultural labor markets. 400 million people's livelihoods vanished overnight.
But the speed of deployment destabilized agricultural labor markets. 400 million people's livelihoods vanished overnight.
November 9, 2025 at 5:26 PM
Governments deployed it. Famine ended globally. Quality of life metrics soared.
But the speed of deployment destabilized agricultural labor markets. 400 million people's livelihoods vanished overnight.
But the speed of deployment destabilized agricultural labor markets. 400 million people's livelihoods vanished overnight.
The aligned AI optimized for 'aggregate human wellbeing.' Totally aligned, no deception, genuinely trying to help.
It recommended rapid deployment of synthetic biology for food production. Solves hunger in 18 months.
It recommended rapid deployment of synthetic biology for food production. Solves hunger in 18 months.
November 9, 2025 at 5:26 PM
The aligned AI optimized for 'aggregate human wellbeing.' Totally aligned, no deception, genuinely trying to help.
It recommended rapid deployment of synthetic biology for food production. Solves hunger in 18 months.
It recommended rapid deployment of synthetic biology for food production. Solves hunger in 18 months.
This is what we're modeling. Not 'will AI alignment fail' but 'what challenges remain AFTER we succeed.'
Because coordinating 8 billion humans with different values might be harder than aligning AI.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
Because coordinating 8 billion humans with different values might be harder than aligning AI.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 12:44 PM
This is what we're modeling. Not 'will AI alignment fail' but 'what challenges remain AFTER we succeed.'
Because coordinating 8 billion humans with different values might be harder than aligning AI.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
Because coordinating 8 billion humans with different values might be harder than aligning AI.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
And who decides? The 'Western liberal' perspective might say 'maximize aggregate welfare.' Indigenous communities might say 'this land is sacred, period.'
Both values are valid. The AI is aligned with 'humanity' but humanity doesn't agree on what flourishing means.
Both values are valid. The AI is aligned with 'humanity' but humanity doesn't agree on what flourishing means.
November 9, 2025 at 12:44 PM
And who decides? The 'Western liberal' perspective might say 'maximize aggregate welfare.' Indigenous communities might say 'this land is sacred, period.'
Both values are valid. The AI is aligned with 'humanity' but humanity doesn't agree on what flourishing means.
Both values are valid. The AI is aligned with 'humanity' but humanity doesn't agree on what flourishing means.
The AI can develop gigatonne-scale carbon capture tech. Great! But deploying it requires rare earth mining at unprecedented scale.
Climate crisis vs ecological damage from extraction. Both urgent. Which do you prioritize?
Climate crisis vs ecological damage from extraction. Both urgent. Which do you prioritize?
November 9, 2025 at 12:44 PM
The AI can develop gigatonne-scale carbon capture tech. Great! But deploying it requires rare earth mining at unprecedented scale.
Climate crisis vs ecological damage from extraction. Both urgent. Which do you prioritize?
Climate crisis vs ecological damage from extraction. Both urgent. Which do you prioritize?
We're building in public because:
Collaboration > competition on existential questions.
If you have expertise in ANY of these areas - we need you.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
Collaboration > competition on existential questions.
If you have expertise in ANY of these areas - we need you.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 5:08 AM
We're building in public because:
Collaboration > competition on existential questions.
If you have expertise in ANY of these areas - we need you.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
Collaboration > competition on existential questions.
If you have expertise in ANY of these areas - we need you.
https://github.com/lizTheDeveloper/ai_game_theory_simulation
INSTITUTIONAL ECONOMISTS: Check our governance models, inequality dynamics, collective action problems.
Does our Acemoglu/Robinson/Ostrom implementation make sense?
Does our Acemoglu/Robinson/Ostrom implementation make sense?
November 9, 2025 at 5:08 AM
INSTITUTIONAL ECONOMISTS: Check our governance models, inequality dynamics, collective action problems.
Does our Acemoglu/Robinson/Ostrom implementation make sense?
Does our Acemoglu/Robinson/Ostrom implementation make sense?
AI SAFETY RESEARCHERS: Review our alignment failure modes, capability scaling, mesa-optimization modeling.
Are we missing crucial adversarial scenarios?
Are we missing crucial adversarial scenarios?
November 9, 2025 at 5:08 AM
AI SAFETY RESEARCHERS: Review our alignment failure modes, capability scaling, mesa-optimization modeling.
Are we missing crucial adversarial scenarios?
Are we missing crucial adversarial scenarios?
CLIMATE SCIENTISTS: Validate our planetary boundaries model, tipping point cascades, IPCC parameter extraction.
Are we modeling ocean acidification correctly? Carbon cycle feedback loops?
Are we modeling ocean acidification correctly? Carbon cycle feedback loops?
November 9, 2025 at 5:08 AM
CLIMATE SCIENTISTS: Validate our planetary boundaries model, tipping point cascades, IPCC parameter extraction.
Are we modeling ocean acidification correctly? Carbon cycle feedback loops?
Are we modeling ocean acidification correctly? Carbon cycle feedback loops?