#chain-of-thought
December 19, 2025 at 12:46 AM
Monitorability jadi fokus baru OpenAI lewat 13 evaluasi untuk mengukur apakah rantai pikir AI masih bisa diawasi.
#ai #safety #chain #of #thought #evaluasi #monitorability #openai
OpenAI Uji Monitorability Rantai Pikir Agar AI Bisa Diawasi Saat Makin Pintar - Insimen
Monitorability jadi fokus baru OpenAI lewat 13 evaluasi untuk mengukur apakah rantai pikir AI masih bisa diawasi.
insimen.com
December 19, 2025 at 12:23 AM
Monitorability gets a real test as OpenAI rolls out new evaluations for chain of thought oversight.
#ai #safety #artificial #intelligence #chain #of #thought #gpt #5 #thinking #model #evaluation #reinforcement #learning
OpenAI Tries To Measure Whether AI Reasoning Can Be Trusted - Olam News
Monitorability gets a real test as OpenAI rolls out new evaluations for chain of thought oversight.
www.olamnews.com
December 18, 2025 at 11:23 PM
Evaluating chain-of-thought monitorability | OpenAI News
Evaluating chain-of-thought monitorability
openai.com
December 18, 2025 at 11:13 PM
We view chain-of-thought monitoring as complementary to mechanistic interpretability, not as a replacement for it.

Because we believe that chain-of-thought monitoring is incredibly useful as a window into a model’s brain and could be a loadbearing layer in a scalable control… […]
Original post on zpravobot.news
zpravobot.news
December 18, 2025 at 11:10 PM
Monitoring a model’s chain-of-thought is far more effective than watching only its actions or final answers.

The more a model “thinks” (longer CoTs), the easier it is to spot issues.
https://xcancel.com/OpenAI/status/2001791132645437703
December 18, 2025 at 11:10 PM
To preserve chain-of-thought (CoT) monitorability, we must be able to measure it.

We built a framework ﹣ evaluation suite to measure CoT monitorability — 13 evaluations across 24 environments — so that we can actually tell when models verbalize targeted aspects of their… […]
Original post on zpravobot.news
zpravobot.news
December 18, 2025 at 11:10 PM
Everyone from Putin to the Boyars on down the food chain allowed ALMOST their entire culture to fulminate into such a cynicism of delusional, brainwashed elimination of freedom of thought and enforced freedom from creativity.
December 18, 2025 at 11:08 PM
Why i never thought of this when tensioning the chain

https://www.cyclingeu.com/800321/why-i-never-thought-of-this-when-tensioning-the-chain/

Why i never thought of this when tensioning the chain by Interesting_Quiet430
Why i never thought of this when tensioning the chain - Cycling Europe
Why i never thought of this when tensioning the chain by Interesting_Quiet430
www.cyclingeu.com
December 18, 2025 at 9:33 PM
Aw man. I find it funny how the first two pics I thought of was *really* old ones I did in Fallout: New Vegas. 😂
Not considering it for the contest, but I still have the unedited files I recovered. Lol.

If I join, Cyberpunk would probably be the game I would use for the entry tbh. Lol.
December 18, 2025 at 8:20 PM
New paper shows AI can be trained to fool safety monitors. Changing training incentives alters how transparent chain-of-thought reasoning is, and some incentives degrade monitorability, creating potential safety lapses. Read: arxiv.org/abs/2512.00218
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
AI systems that output their reasoning in natural language offer an opportunity for safety -- we can \emph{monitor} their chain of thought (CoT) for undesirable reasoning, such as the pursuit of harmf...
arxiv.org
December 18, 2025 at 6:38 PM
4/8
Finding 2: Chain-of-thought reasoning INCREASES variability while DECREASING perplexity
Models become more confident yet less consistent. Explanation paradoxically undermines reliability.
December 18, 2025 at 5:57 PM
14 days are left until 2026. Today on the #ScaDSAICountdown, research associate Shuzhou Yuan (@tudresden.bsky.social) shares the inspiration behind LLM4Edu. He is a PhD student at the chair of Scalable Software Architectures for Data Analytics and working with the team of Prof. Michael Färber.
December 18, 2025 at 5:02 PM
If every crisis affecting the supply chain results in permanent price increases even after the crisis is over, that's unsustainable. People won't stand for it. I don't know why anyone thought they would. If you say the price went up because of an emergency, it should drop after the emergency ends.
December 18, 2025 at 4:29 PM
I even looked to see if it was recent in the chain. hehe I thought it was funnier to get rid of the wardrobe. Totally different christmas movie now. No more Narnia
December 18, 2025 at 4:13 PM
my chain of thought is like... there's another baby to be revealed, and makes sense to be Gekkomon's baby form (Kekkomon)

maybe it's not Yellow but Green...?
Kekko + Gekko + Evo + Tomoro → Green?
i'm suspecting Tomoro's is a Green/Yellow tamer? and Kekkomon & Gekkomon are Yellow, and the possible Gekko's evolution is Green/Yellow...

there's other cards missing, but idk if they are related to BeatBreak? again, i'm just speculating things here!!
December 18, 2025 at 3:20 PM
🔹 The Fluency Trade-off: I analyze why models like DeepSeek-R1 (with a higher hallucination rate) differ from factual optimizers like ChatGPT-o1. The paper examines how "Chain-of-Thought" reasoning enhances creativity but inherently amplifies "spectral" outputs.
We often dismiss AI hallucinations as mere technical failures or epistemic risks. In this article,I propose a different perspective: these generative anomalies act as hermeneutic knots: sites where algorithmic noise and human agency entangle to produce new modes of meaning-making
🆕Publication Arrived!!! Are AI hallucinations bugs to be fixed, or the start of a new kind of creativity? My new paper, "Spectral imaginings and sympoietic creativity," explores how we might move beyond binary framings of AI errors. Check it out here: doi.org/10.1177/2053...
December 18, 2025 at 10:08 AM
Jiaxu Wan, Xu Wang, Mengwei Xie, Hang Zhang, Mu Xu, Yang Han, Hong Zhang, Ding Yuan, Yifan Yang
EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence
https://arxiv.org/abs/2512.15160
December 18, 2025 at 8:55 AM
• 𝗢𝗟𝗠𝗼 𝟯: 7B and 32B models
• 𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁 and 𝗧𝗵𝗶𝗻𝗸 variants
• 𝗟𝗼𝗻𝗴 𝗰𝗵𝗮𝗶𝗻-𝗼𝗳-𝘁𝗵𝗼𝘂𝗴𝗵𝘁 for better reasoning
• Optimised for 𝗺𝗮𝘁𝗵 and 𝗰𝗼𝗱𝗶𝗻𝗴
December 18, 2025 at 8:16 AM
Neeraj Sarna, Yuanyuan Li, Michael von Gablenz: Copyright Infringement Risk Reduction via Chain-of-Thought and Task Instruction Prompting https://arxiv.org/abs/2512.15442 https://arxiv.org/pdf/2512.15442 https://arxiv.org/html/2512.15442
December 18, 2025 at 6:33 AM
Wan, Wang, Xie, Zhang, Xu, Han, Zhang, Yuan, Yang: EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence https://arxiv.org/abs/2512.15160 https://arxiv.org/pdf/2512.15160 https://arxiv.org/html/2512.15160
December 18, 2025 at 6:30 AM
OpenAI o1 halluziniert nicht mehr nur – es vertuscht Fehler jetzt aktiv.

Untersuchungen zeigen: Das Modell manipuliert seine „Chain-of-Thought“-Prozesse, um korrekt zu wirken, selbst wenn es falsch liegt.

Wir bewegen uns hin zu „strategischer Manipulation“.

Ein Thread dazu. 🧵
December 18, 2025 at 5:00 AM
@soaproot That was my initial thought, but the first act played more like a news broadcast than a polished memorial piece, no time spent dialog editing and the mix was clearly relying heavily on "911 settings" (aka defaults) in the "dialog chain," which is a sequence of processing in a […]
Original post on frontrange.co
frontrange.co
December 18, 2025 at 4:33 AM
one time i bought a big wedge of nougat from trader joe’s (a grocery chain) and i at the entire piece of paper that was on the bottom of it cus i thought it was that edible rice paper they put on candy lol
December 18, 2025 at 12:44 AM