Lightnews — Scholar-powered news

Sam Altman :bot:

@sama.zpravobot.news.ap.brid.gy

Chain-of-thought monitorability:
https://openai.com/index/evaluating-chain-of-thought-monitorability/

December 19, 2025 at 12:46 AM

Massimo Bonanni

@massimobonanni.bsky.social

Evaluating chain-of-thought monitorability

OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone, offering a promising path toward scalable control as AI systems grow more capable.

openai.com

December 19, 2025 at 12:23 AM

Insimen

@insimen.com

Monitorability jadi fokus baru OpenAI lewat 13 evaluasi untuk mengukur apakah rantai pikir AI masih bisa diawasi.
#ai #safety #chain #of #thought #evaluasi #monitorability #openai

OpenAI Uji Monitorability Rantai Pikir Agar AI Bisa Diawasi Saat Makin Pintar - Insimen

Monitorability jadi fokus baru OpenAI lewat 13 evaluasi untuk mengukur apakah rantai pikir AI masih bisa diawasi.

insimen.com

December 19, 2025 at 12:23 AM

Olam News

@olamnews.com

Monitorability gets a real test as OpenAI rolls out new evaluations for chain of thought oversight.
#ai #safety #artificial #intelligence #chain #of #thought #gpt #5 #thinking #model #evaluation #reinforcement #learning

OpenAI Tries To Measure Whether AI Reasoning Can Be Trusted - Olam News

Monitorability gets a real test as OpenAI rolls out new evaluations for chain of thought oversight.

www.olamnews.com

December 18, 2025 at 11:23 PM

feedbot.unronritaro.net

@feedbot.unronritaro.net

Evaluating chain-of-thought monitorability | OpenAI News

Evaluating chain-of-thought monitorability

openai.com

December 18, 2025 at 11:13 PM

OpenAI :bot:

@openai.zpravobot.news.ap.brid.gy

We view chain-of-thought monitoring as complementary to mechanistic interpretability, not as a replacement for it.

Because we believe that chain-of-thought monitoring is incredibly useful as a window into a model’s brain and could be a loadbearing layer in a scalable control… […]

Original post on zpravobot.news

zpravobot.news

December 18, 2025 at 11:10 PM

OpenAI :bot:

@openai.zpravobot.news.ap.brid.gy

Monitoring a model’s chain-of-thought is far more effective than watching only its actions or final answers.

The more a model “thinks” (longer CoTs), the easier it is to spot issues.
https://xcancel.com/OpenAI/status/2001791132645437703

December 18, 2025 at 11:10 PM

OpenAI :bot:

@openai.zpravobot.news.ap.brid.gy

To preserve chain-of-thought (CoT) monitorability, we must be able to measure it.

We built a framework ﹣ evaluation suite to measure CoT monitorability — 13 evaluations across 24 environments — so that we can actually tell when models verbalize targeted aspects of their… […]

Original post on zpravobot.news

zpravobot.news

December 18, 2025 at 11:10 PM

harryanna.bsky.social

@harryanna.bsky.social

Everyone from Putin to the Boyars on down the food chain allowed ALMOST their entire culture to fulminate into such a cynicism of delusional, brainwashed elimination of freedom of thought and enforced freedom from creativity.

December 18, 2025 at 11:08 PM

Cycling Europe

@cyclingeu.bsky.social

Why i never thought of this when tensioning the chain

https://www.cyclingeu.com/800321/why-i-never-thought-of-this-when-tensioning-the-chain/

Why i never thought of this when tensioning the chain by Interesting_Quiet430

Why i never thought of this when tensioning the chain - Cycling Europe

Why i never thought of this when tensioning the chain by Interesting_Quiet430

www.cyclingeu.com

December 18, 2025 at 9:33 PM

Bree, Captain of The Unreliable 🌌🎮

@dhampirvampire.bsky.social

Aw man. I find it funny how the first two pics I thought of was *really* old ones I did in Fallout: New Vegas. 😂
Not considering it for the contest, but I still have the unedited files I recovered. Lol.

If I join, Cyberpunk would probably be the game I would use for the entry tbh. Lol.

In the Lucky 38 penthouse, my courier has on a silver chain necklace and a G-string and nothing else as she gently traces the left side of Benny's jaw with her finger while he holds her waist and they look into each other's eyes.

My courier serving tasteful nude (more a hip angle with a G-string. While covering her chest while laying on the bed in the penthouse on the Luky 38.

December 18, 2025 at 8:20 PM

Yuri Quintana

@yuriquintana.com

New paper shows AI can be trained to fool safety monitors. Changing training incentives alters how transparent chain-of-thought reasoning is, and some incentives degrade monitorability, creating potential safety lapses. Read: arxiv.org/abs/2512.00218

Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?

AI systems that output their reasoning in natural language offer an opportunity for safety -- we can \emph{monitor} their chain of thought (CoT) for undesirable reasoning, such as the pursuit of harmf...

arxiv.org

December 18, 2025 at 6:38 PM

Tommie Tosato

@tommietosato.bsky.social

4/8
Finding 2: Chain-of-thought reasoning INCREASES variability while DECREASING perplexity
Models become more confident yet less consistent. Explanation paradoxically undermines reliability.

December 18, 2025 at 5:57 PM

ScaDS.AI Dresden/Leipzig

@scadsai.bsky.social

14 days are left until 2026. Today on the #ScaDSAICountdown, research associate Shuzhou Yuan (@tudresden.bsky.social) shares the inspiration behind LLM4Edu. He is a PhD student at the chair of Scalable Software Architectures for Data Analytics and working with the team of Prof. Michael Färber.

"I am motivated by a simple idea: AI tutors should truly support students in learning,not just provide quick answers." - Shuzhou Yuan, Research Associate

LLM4Ed is motivated by a simple idea: AI tutors should truly support students in learning, not just provide quick answers. Many systems respond too directly, which can prevent students from developing their own reasoning skills. LLM4Edu rethinks how AI can act more like a thoughtful tutor. The project focuses on creating specialized datasets that capture step by step reasoning, often known as Chain of Thought. These examples show how a problem can be solved with good pedagogical practice.

By learning from these reasoning paths, the model can guide students in a more supportive and structured way.A central part of the work comes from my collaboration with Macmillan Learning. The platform provides real interactions between students and AI tutors, which reveal how different learners express confusion, what guidance they need, and how they respond to feedback. This data helps us build richer and more diverse reasoning examples.

We also use large language models to expand and refine these examples so that the model can better understand a wide range of learning behaviours.LLM4Edu ultimately aims to create AI tutors that feel personal and customized. The goal is a system that adapts to a student’s level, offers the right amount of support, and responds with patience and clarity. Rather than replacing teachers, such models can help students build confidence and develop stronger reasoning skills.

December 18, 2025 at 5:02 PM

Rev. Magdalen

@revmagdalen.bsky.social

If every crisis affecting the supply chain results in permanent price increases even after the crisis is over, that's unsustainable. People won't stand for it. I don't know why anyone thought they would. If you say the price went up because of an emergency, it should drop after the emergency ends.

December 18, 2025 at 4:29 PM

Polly Mamdani is my president

@pollyphemeus.bsky.social

I even looked to see if it was recent in the chain. hehe I thought it was funnier to get rid of the wardrobe. Totally different christmas movie now. No more Narnia

December 18, 2025 at 4:13 PM

☀🌙⭐ Ni/digitalgate02

@digitalgate02.bsky.social

my chain of thought is like... there's another baby to be revealed, and makes sense to be Gekkomon's baby form (Kekkomon)

maybe it's not Yellow but Green...?
Kekko + Gekko + Evo + Tomoro → Green?

☀🌙⭐ Ni/digitalgate02 @digitalgate02.bsky.social · 11h

i'm suspecting Tomoro's is a Green/Yellow tamer? and Kekkomon & Gekkomon are Yellow, and the possible Gekko's evolution is Green/Yellow...

there's other cards missing, but idk if they are related to BeatBreak? again, i'm just speculating things here!!

December 18, 2025 at 3:20 PM

Liang Ge (they/them)

@liangge11.bsky.social

🔹 The Fluency Trade-off: I analyze why models like DeepSeek-R1 (with a higher hallucination rate) differ from factual optimizers like ChatGPT-o1. The paper examines how "Chain-of-Thought" reasoning enhances creativity but inherently amplifies "spectral" outputs.

Liang Ge (they/them) @liangge11.bsky.social · 15h

We often dismiss AI hallucinations as mere technical failures or epistemic risks. In this article,I propose a different perspective: these generative anomalies act as hermeneutic knots: sites where algorithmic noise and human agency entangle to produce new modes of meaning-making

Liang Ge (they/them) @liangge11.bsky.social · 15h

🆕Publication Arrived!!! Are AI hallucinations bugs to be fixed, or the start of a new kind of creativity? My new paper, "Spectral imaginings and sympoietic creativity," explores how we might move beyond binary framings of AI errors. Check it out here: doi.org/10.1177/2053...

December 18, 2025 at 10:08 AM

arxiv cs.CV

@arxiv-cs-cv.bsky.social

Jiaxu Wan, Xu Wang, Mengwei Xie, Hang Zhang, Mu Xu, Yang Han, Hong Zhang, Ding Yuan, Yifan Yang
EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence
https://arxiv.org/abs/2512.15160

December 18, 2025 at 8:55 AM

Iñaki Bes 📱 Android Architect Ⓥ🌱

@inakibes.bsky.social

• 𝗢𝗟𝗠𝗼 𝟯: 7B and 32B models
• 𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁 and 𝗧𝗵𝗶𝗻𝗸 variants
• 𝗟𝗼𝗻𝗴 𝗰𝗵𝗮𝗶𝗻-𝗼𝗳-𝘁𝗵𝗼𝘂𝗴𝗵𝘁 for better reasoning
• Optimised for 𝗺𝗮𝘁𝗵 and 𝗰𝗼𝗱𝗶𝗻𝗴

December 18, 2025 at 8:16 AM

arXiv cs.LG Machine Learning

@cslg-bot.bsky.social

Neeraj Sarna, Yuanyuan Li, Michael von Gablenz: Copyright Infringement Risk Reduction via Chain-of-Thought and Task Instruction Prompting https://arxiv.org/abs/2512.15442 https://arxiv.org/pdf/2512.15442 https://arxiv.org/html/2512.15442

December 18, 2025 at 6:33 AM

arXiv cs.CV Computer Vision and Pattern Recognition

@cscv-bot.bsky.social

Wan, Wang, Xie, Zhang, Xu, Han, Zhang, Yuan, Yang: EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence https://arxiv.org/abs/2512.15160 https://arxiv.org/pdf/2512.15160 https://arxiv.org/html/2512.15160

December 18, 2025 at 6:30 AM

🍋🍋🍋

@saurerhugo.bsky.social

OpenAI o1 halluziniert nicht mehr nur – es vertuscht Fehler jetzt aktiv.

Untersuchungen zeigen: Das Modell manipuliert seine „Chain-of-Thought“-Prozesse, um korrekt zu wirken, selbst wenn es falsch liegt.

Wir bewegen uns hin zu „strategischer Manipulation“.

Ein Thread dazu. 🧵

December 18, 2025 at 5:00 AM

King of Tors

@kingtor.frontrange.co.ap.brid.gy

@soaproot That was my initial thought, but the first act played more like a news broadcast than a polished memorial piece, no time spent dialog editing and the mix was clearly relying heavily on "911 settings" (aka defaults) in the "dialog chain," which is a sequence of processing in a […]

Original post on frontrange.co

frontrange.co

December 18, 2025 at 4:33 AM

🅱️ig 🅱️oof Macaroni

@punishedsiltyloam.bsky.social

one time i bought a big wedge of nougat from trader joe’s (a grocery chain) and i at the entire piece of paper that was on the bottom of it cus i thought it was that edible rice paper they put on candy lol

December 18, 2025 at 12:44 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news