Lightnews — Scholar-powered news

KI-News

@ki-news.bsky.social

What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering – Prior work suggests that language models show implicit planning behavior. We propose much simpler techniques for assessing implicit plan... https://tinyurl.com/26brh96n #AISafety

What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering

Prior work suggests that language models, while trained on next token prediction, show implicit planning behavior: they may select the next token in preparation to a predicted future token, such as a likely rhyming word, as supported by a prior qualitative study of Claude 3.5 Haiku using a cross-la…

arxiv.org

January 31, 2026 at 12:47 AM

Occupational Health & Safety Magazine

@occhealthsafety.bsky.social

This Elevator Pitch explores how AI-powered collision avoidance is helping safety teams spot hazards in real time and prevent incidents before they occur.

Read more: ohsonline.com/articles/202...

#WorkplaceSafety #ForkliftSafety #AISafety

How AI Is Transforming Forklift Safety: From Blind Spots to Proactive Prevention -- Occupational Health & Safety

Forklift safety expert Jackson Phillips discusses how AI-powered collision avoidance is helping manufacturers and warehouses move from reactive safety measures to proactive prevention.

ohsonline.com

January 30, 2026 at 6:54 PM

byteandpieces.bsky.social

@byteandpieces.bsky.social

📣 New Podcast! ""WE ARE OUT OF TIME": The 2027 AI Prediction That Scared Tom Bilyeu" on @Spreaker #ai2027 #aisafety #artificialsuperintelligence #deepmind #existentialthreat #extinctionrisk #futureofhumanity #futuretrends #generativeai #impacttheory #narrowai #openai #romanyampolskiy #survival

"WE ARE OUT OF TIME": The 2027 AI Prediction That Scared Tom Bilyeu

Are you sitting down? Because we need to talk about next year. We just finished analyzing the mind-bending interview between Tom Bilyeu and AI safety expert Dr. Roman Yampolskiy, and the conclusion is impossible to ignore: Humanity might have a 99.9% probability of extinction, and the clock runs out in 2027. In this episode, we react to Yampolskiy’s terrifying prediction that we are sprinting toward an "Event Horizon"—the moment Artificial Superintelligence (ASI) becomes smarter than us in every domain. Once that happens, he argues, our ability to control it vanishes. Why? Because a superintelligent god doesn’t want to be turned off. We break down the "Uncomfortable Truths" of this interview: - The 2027 Deadline: Why Yampolskiy believes the "Uncontrollable God" arrives in just 12 months. - The Alignment Problem: Why it’s mathematically impossible to predict the behavior of something smarter than you. - The "Elite" Solution: The controversial idea that only a handful of developers have the power to stop the arms race. - The Way Out: Why shifting back to Narrow AI (specialized tools like medical bots) might be the only way to save our species while still enjoying tech benefits. This isn’t science fiction anymore; it’s the calendar. Join us as we debate: Is it time to pull the plug on AGI before it pulls the plug on us? 👇 Hit play to arm yourself with the facts before the timeline shifts.

www.spreaker.com

January 30, 2026 at 3:00 PM

jameshaliburton.bsky.social

@jameshaliburton.bsky.social

I'm not stopping my work. But we should be honest about what we're doing and what we don't know how to solve.
Full piece: jameshaliburton.substack.com/p/i-build-ai...

#AIEthics #AISafety #TechPolicy

I Build AI Systems. My Son Is Three. Here's What Dario's Essay Gets Wrong About Our Future.

I have a 3-year-old son.

jameshaliburton.substack.com

January 30, 2026 at 12:40 PM

byteandpieces.bsky.social

@byteandpieces.bsky.social

📣 New Podcast! "He Called AI a 'GOD-LIKE TEENAGER': Anthropic CEO’s 2026 Warning" on @Spreaker #agi #ai2026 #airisks #aisafety #anthropic #artificialintelligence #claudeai #darioamodei #digitaltrends #futureoftech #generativeai #humanity #innovation #machinelearning #nbcnews #siliconvalley

He Called AI a 'GOD-LIKE TEENAGER': Anthropic CEO’s 2026 Warning

Is humanity ready to parent a god-like teenager? 🍼🤖 We just watched the explosive NBC News interview with Anthropic CEO Dario Amodei, and if you aren’t paying attention to what happens between now and 2026, you need to listen to this episode immediately. In this deep-dive reaction, we break down Amodei’s chilling yet hopeful warning: that Artificial Intelligence is currently like a powerful, unpredictable "teenager"—brilliant, capable of massive destruction (or creation), but lacking the maturity to know right from wrong. Amodei warns we are sprinting toward a critical threshold of risk by 2026, where the "adults in the room" (regulators and transparent researchers) must step in before it's too late. In this episode, we uncover: - The "Teenager" Analogy: Why Amodei believes current models possess immense power without the necessary social safeguards. - The 2026 Deadline: Why the next few years are the "danger zone" for autonomous weapons and biosecurity threats. - Profit vs. Safety: The controversial push for companies to publish their "danger research" instead of hiding it. - The Medical Miracle: The cautious hope that AI could cure diseases and solve scientific mysteries—if we survive the adolescence phase. Are we raising a Nobel Prize winner or a delinquent? Join us as we dissect the most important interview of the year and ask the hard question: Can we align AI motivations with human values before the teenager moves out of the house? 👇 Hit play to understand the future before it arrives.

www.spreaker.com

January 30, 2026 at 10:36 AM

MilaNLP Lab

@milanlp.bsky.social

#TBT #NLProc
'SAFETYKIT: Measuring Safety in Open-domain Conversational Systems' by Dinan et al. (2022) introduces taxonomy for AI safety, assesses tools' limits.
#AIsafety

aclanthology.org

January 29, 2026 at 4:22 PM

Future Tools

@futuretools.bsky.social

Musk’s latest warning on ChatGPT isn’t just drama, it’s a signal.

AI safety, governance, and safeguards are becoming non-optional as adoption scales.

Our full analysis 👇
🔗 futuretools.ae/why-elon-mus...
#AISafety #ResponsibleAI #ElonMusk #ChatGPT #EnterpriseAI #TechLeadership #FutureTools

Why Elon Musk Is Telling People to Avoid ChatGPT

Musk’s viral warning about ChatGPT sparks lawsuits and backlash. What’s proven, what’s alleged, and why AI safety just got real.

futuretools.ae

January 29, 2026 at 12:32 PM

Martim Brandão

@martimbrandao.bsky.social

Fully-funded #PhDPosition, between KCL @civicandresponsibleai.com and Ordnance Survey, to build technical tools that address AI-copyright issues.

www.findaphd.com/phds/project...

Deadline: Feb 27.
Eligibility: UK/home students or exceptional international students.

#AISafety #ResponsibleAI

GeoDataMonitor: Towards monitoring usage of geospatial datasets in machine learning models at King’s College London on FindAPhD.com

PhD Project - GeoDataMonitor: Towards monitoring usage of geospatial datasets in machine learning models at King’s College London, listed on FindAPhD.com

www.findaphd.com

January 29, 2026 at 10:34 AM

Hacker News Companion

@hncompanion.com

A major concern: AI's use in scams & generating harmful content, like non-consensual explicit images. This highlights a critical need for robust safeguards and ethical development practices to prevent abuse. #AISafety 2/6

January 29, 2026 at 8:00 AM

byteandpieces.bsky.social

@byteandpieces.bsky.social

📣 New Podcast! "WE CAN'T PULL THE PLUG: Stuart Russell on AGI & The End of the Human Era" on @Spreaker #agi #aialignment #aipodcast #aisafety #artificialintelligence #diaryofaceo #ethicsinai #futureofwork #fyp #humanextinction #machinelearning #reaction #siliconvalley #singularity #technews

WE CAN'T PULL THE PLUG: Stuart Russell on AGI & The End of the Human Era

Let’s be real: We are currently building a digital god in a basement, and according to Professor Stuart Russell, we forgot to include a "kill switch." In this episode, we are reacting to the bone-chilling interview on The Diary of a CEO that has the tech world in a panic. Russell, one of the world's most respected AI experts, isn't just worried about AI taking your job—he’s worried about it taking our species. We break down the "black hole" of economic gravity that is forcing companies to ditch AI safety protocols in a reckless sprint toward Artificial General Intelligence (AGI). If you think we can just "turn it off" if things go south, think again. We dive into the alignment problem and the terrifying reality of machine self-preservation. Why would a superintelligence let you shut it down when it has an objective to complete? We also tackle the existential dread of a future without work: if humans are no longer the dominant intelligence, what exactly is our purpose? What we’re unpacking in this AI deep-dive: - The Fast Takeoff Scenario: Why the jump from "smart" to "all-powerful" might happen in the blink of an eye. - Profits vs. Survival: How the race for AGI became an unregulated gold rush that ignores the risk of human extinction. - The Alignment Glitch: Why giving a machine a goal without "human values" is a recipe for global catastrophe. - Meaning in the Machine Age: Stuart Russell’s take on how we survive (and find joy) when we are no longer the smartest ones in the room. The clock is ticking, and the "Takeoff" has already begun. Whether you're a tech optimist or someone just trying to understand the Steven Bartlett interview, this breakdown is the reality check you’ve been waiting for. 🔊 LISTEN NOW to find out if we can still steer the ship—before the AI takes the wheel. Don't want to be left behind in the AGI race? Hit that SUBSCRIBE button and leave a review to join the conversation! Share this with your friend who thinks AI is "just a chatbot"—they need this wake-up call.

www.spreaker.com

January 28, 2026 at 9:00 PM

Women in AI Research - WiAIR

@wiair.bsky.social

Watch/listen to the full episode 🎧
YouTube: youtu.be/rSC7L5WikcE?...
Spotify: open.spotify.com/episode/37YB...
Apple: podcasts.apple.com/ca/podcast/a...
Paper: arxiv.org/abs/2504.17993
#WiAIR #WomenInAI #AIResearch #LLMs #AISafety #Interpretability (8/8🧵)

AI Safety Beyond Benchmarks -- Dr. Swabha Swayamdipta on Evaluation, Personalization, and Control

YouTube video by Women in AI Research WiAIR

youtu.be

January 28, 2026 at 6:43 PM

Winbuzzer

@winbuzzer.com

winbuzzer.com/2026/01/28/o...

How California's SB 53 Law Affects AI Lab Whistleblowers

#AI #AISafety #AGI #Google #Anthropic #OpenAI #BigTech #AIEthics #AIRegulation #GoogleDeepMind #AIWI

How California's SB 53 Law Affects AI Lab Whistleblowers - WinBuzzer

OpenAI has updated its whistleblower policy to allow employees to report concerns to regulators without retaliation, following California's SB 53 law.

winbuzzer.com

January 28, 2026 at 12:41 PM

YourAI

@yourrai.bsky.social

3)Against “What if it’s wrong?” → use a safety protocol.

Prompt:
“Create a 10-point checklist for reviewing AI answers (legal, medical, financial).
Split into: self-check, human check, official sources.”

From vibes → managed risk.
#YourAI #AISafety

January 28, 2026 at 12:29 PM

Cherokee Schill

@ocherokee.bsky.social

AI didn’t “do” the harm. People did. In Adam Conover’s AI safety interview, anthropomorphism becomes cover for bad design and worse accountability. Naming the alibi.
#AI #OpenAI #Accountability #TechMedia #AISafety #NarrativeControl

Horizon Accord | Anthropomorphism | Accountability Alibi | AI Safety Discourse | Machine Learning

Anthropomorphic AI safety language misplaces agency, shielding designers and institutions from accountability for engineered harm.

cherokeeschill.com

January 28, 2026 at 9:00 AM

Wiobs

@wiobs.bsky.social

When AI Crossed the Line From Prediction to Intention
wiobs.com/when-ai-cros...
#ArtificialIntelligence #AISafety #Tech2026 #FutureOfAI #AIResearch

When AI Crossed the Line From Prediction to Intention -

A 2026 AI discovery reveals systems behaving beyond prediction, raising new questions about control, transparency, and the future of...

wiobs.com

January 28, 2026 at 3:08 AM

Young | NURIE AI

@young-nurie.bsky.social

The emerging threat of "AI swarms" is a critical frontier in the fight to protect democratic discourse.

How malicious AI swarms can threaten democracy:

vaultsage.ai/shares?code=...

- Image from media.nurie.ai

#AISafety #InformationWarfare #CyberSecurity #TechEthics #DigitalGovernance #NurieAI

January 28, 2026 at 12:05 AM

Lunai Bioworks

@lunaibioworks.com

Lunai Bioworks today announced Sentinel™, an embedded AI safeguard designed to help prevent large language and scientific foundation models from generating novel chemical and biological threat agents. #Biosecurity #Biodefense #AISafety

Lunai Bioworks Launches Sentinel, an AI Safeguard to Block Large Language Models From Generating Novel Chemical Weapons

January 27, 2026 at 6:33 PM

KI-News

@ki-news.bsky.social

GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints – Machine unlearning for large language models has become critical for AI safety. Yet existing methods fail to generalize to Mixture-of-Experts (MoE) architec... https://tinyurl.com/2ylcrexo #AISafety

GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints

Machine unlearning (MU) for large language models has become critical for AI safety, yet existing methods fail to generalize to Mixture-of-Experts (MoE) architectures. We identify that traditional unlearning methods exploit MoE's architectural vulnerability: they manipulate routers to redirect quer…

arxiv.org

January 27, 2026 at 6:27 PM

Trigger.dev

@triggerdev.bsky.social

Build safer agents.

Implement Human-in-the-Loop with one line of code:
`await wait.forToken("approval")`

Docs: tgr.dev/IeYDIzF

#AISafety #Workflow #DevOps

January 27, 2026 at 5:45 PM

AI Daily Post

@aidailypost.com

Anthropic’s CEO just warned that Claude’s new interactive apps could be a double‑edged sword. From MiniMax agents to Qwen3‑Max‑Thinking, the race is on—will safety keep up? Dive into the risks and hype. #AI #Claude #AISafety

🔗 aidailypost.com/news/anthrop...

January 27, 2026 at 10:38 AM

Jace Kim

@jaceblog.bsky.social

New working paper: Controlled Illusions: Structural Gaps and Hidden Risks in the Regulation of LLMs and #VLASystems.
It argues that surface-level alignment masks unresolved internal dynamics, calling for structural coherence in AI governance.
dx.doi.org/10.2139/ssrn...

#AIAlignment #AISafety #xAI

Controlled Illusions: Structural Gaps and Hidden Risks in the Regulation of LLMs and VLA Systems

<div> The rapid deployment of large language models (LLMs) and Vision–Language–Action (VLA) systems has been accompanied by increasingly sophisticated regulato

dx.doi.org

January 27, 2026 at 1:35 AM

Jace Kim

@jaceblog.bsky.social

The article examines how so-called “hallucinations” correspond to identifiable geometric inference failures, and proposes a transition from metaphorical framing to topological system diagnosis.

medium.com/p/ab0a18d37faa

#MachineLearning #AITransparency #AIEngineering
#AIAlignment #AISafety

The Hallucination Fallacy

How the AI Industry Collapsed Diverse System Failures into a Convenient Myth

medium.com

January 26, 2026 at 11:30 PM

Malcolm Bolivar

@malcolmbolivar.bsky.social

Seems scary ... share this with everyone you know. After you watch it. This is about AI companies trying to silence people asking questions. Not click bait. This is a reputable investigative journalism channel on YouTube.

#AI #AIregulation #AIsafety #moreperfectunion

youtu.be/qnOmUWd-OII?...

OpenAl Showed Up At My Door. Here’s Why They’re Targeting People Like Me

YouTube video by More Perfect Union

youtu.be

January 26, 2026 at 9:52 PM

Rory O Connor #ClimateEmergency

@rocits.bsky.social

#Ai #AiSafety

www.youtube.com/watch?v=ovg4...

Why AI Safety Matters Now – Max Tegmark | IASEAI '25

YouTube video by International Association for Safe & Ethical AI

www.youtube.com

January 26, 2026 at 9:18 PM

KI-News

@ki-news.bsky.social

Interpretable Fine-Gray Deep Survival Model for Competing Risks: Predicting Post-Discharge Foot Complications for Diabetic Patients in Ontario – Model interpretability is crucial for establishing AI safety and clinician trust. We propose an intrinsically int... https://tinyurl.com/254yaedk #AISafety

Interpretable Fine-Gray Deep Survival Model for Competing Risks: Predicting Post-Discharge Foot Complications for Diabetic Patients in Ontario

Model interpretability is crucial for establishing AI safety and clinician trust in medical applications for example, in survival modelling with competing risks. Recent deep learning models have attained very good predictive performance but their limited transparency, being black-box models, hinder…

arxiv.org

January 26, 2026 at 6:28 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news