Lightnews — Scholar-powered news

Liz Fong-Jones (方禮真)

@lizthegrey.com

metr.org/blog/2025-03... h/t @simonwillison.net

For people who were citing the earlier METR study showing no increase in open source contribution speed, update your priors. Opus 4.5 can autonomously complete complex tasks 50% of the time that would take a human 4+ hours to do.

January 1, 2026 at 5:48 AM

Tim Kellogg

@timkellogg.me

OpenAI released a new benchmark that GPT-5.2 wins

openai.com/index/fronti...

December 16, 2025 at 7:45 PM

Tim Kellogg

@timkellogg.me

Gemini Deep Research is available in the API

it uses Gemini 3 Pro, MCP, documents

blog.google/technology/d...

An infographic titled “Gemini Deep Research Agent” comparing several models across three benchmarks. Each benchmark section contains vertically aligned bar charts with values printed above them.

⸻

1. Humanity’s Last Exam (Reasoning & knowledge)

Bars from left to right:
• Gemini Deep Research 12/2025 — dark blue bar, 46.4%
• Gemini 3 Pro — light blue bar, 43.2%
• o4-mini deep research — light gray bar, 19.9%
• o3-deep research — light gray bar, 19.1%
• GPT-5 Pro — gray hatched bar, 38.9%

A small icon above the group indicates parallel inference time scaling.

⸻

2. DeepSearchQA (Comprehensive web research)

Bars from left to right:
• Gemini Deep Research 12/2025 — dark blue bar, 66.1%
• Gemini 3 Pro — light blue bar, 56.6%
• o4-mini deep research — light gray bar, 40.4%
• o3-deep research — light gray bar, 44.2%
• GPT-5 Pro — gray hatched bar, 65.2%

The same small icon appears above this section.

⸻

3. BrowseComp (Locating hard-to-find facts)

Bars from left to right:
• Gemini Deep Research 12/2025 — dark blue bar, 59.2%
• Gemini 3 Pro — light blue bar, 49.4%
• o4-mini deep research — light gray bar, 12.4%
• o3-deep research — light gray bar, 25%
• GPT-5 Pro — gray hatched bar, 59.5%

Again, the parallel-inference icon appears here.

⸻

Footer text (small):

Clarifies that results use web search.
Humanity’s Last Exam and BrowseComp were evaluated by Google DeepMind using public APIs.
DeepSearchQA was independently evaluated by Kaggle.

⸻

The overall visual emphasizes that Gemini Deep Research (Dec 2025) outperforms other models on all three tasks, with GPT-5 Pro scoring close behind in DeepSearchQA and BrowseComp.

December 11, 2025 at 11:13 PM

Litbowl

@litbowl.bsky.social

Update: @propublica.org finally has privately responded. They say they intend to CONTINUE using illegal, theft-based LLMs—the specific one in this case being Open AI's GPT-o4 mini. They refused to answer the questions about why they're parroting AI companies' propaganda around hallucinations...

Litbowl @litbowl.bsky.social · Mar 18

Nearly a week now without any answers or transparency from @propublica.org in response to dozens of readers concerned about their misguided use of an unnamed generative AI tool. Just sent this email to hello@propublica.org—I encourage anyone who donates to them to ask for these answers as well.

March 21, 2025 at 7:59 PM

Neil Carpenter

@neilcar.bsky.social

‘The company’s o1 reasoning model “hallucinated 16 percent of the time” when summarizing public information, while newer models o3 and o4-mini “hallucinated 33 percent and 48 percent of the time, respectively.”’

What could go wrong?

Jake Williams @malwarejake.bsky.social · Sep 21

This seems like a problem for OpenAI's business model.

Prof Mike Yearworth @mikeyearworth.bsky.social · Sep 21

"In a landmark study, OpenAI researchers reveal that large language models will always produce plausible but false outputs, even with perfect data, due to fundamental statistical and computational limits."

www.computerworld.com/article/4059...

September 21, 2025 at 10:26 PM

Donna E

@edwardsdna.bsky.social

o4-mini in Action: Deep Reasoning Over Text and Images @Azure #DeepLearning #AI #MachineLearning

- YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

dlvr.it

September 8, 2025 at 7:04 PM

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

OpenAI 即將發佈 o3 及 o4-mini GPT-5 今夏推出 OpenAI 執行長 Sam Altman 今日於 X ...

https://www.newmobilelife.com/2025/04/05/openai-o3-o4-mini-gpt-5-coming/?utm_source=rss&utm;_medium=rss&utm;_campaign=openai-o3-o4-mini-gpt-5-coming

#Daily #AI #Content #主頁 #- #Android […]

[Original post on newmobilelife.com]

April 4, 2025 at 6:50 PM

SkynetAndChill.com

@druce.ai

Artificialanalysis.ai/ initially puts o4-mini on the efficient frontier, beasting gemini 2.5 on cost and performance curve

AI Model & API Providers Analysis | Artificial Analysis

Comparison and analysis of AI models and API hosting providers. Independent benchmarks across key performance metrics including quality, price, output speed & latency.

artificialanalysis.ai

April 17, 2025 at 3:15 PM

Gurur Dayanık

@gururdayanik.bsky.social

OpenAI yeni modeller piyasaya sürüyor: GPT‑4.1, o3 ve o4‑mini

GPT 4.1 sadece API'de mevcut, diğerleri ChatGPT'de mevcut.

o3 şimdiye kadarki en akıllı akıl yürütme modelidir; o4‑mini ise onun hızlı kardeşidir. Her ikisi de görüntülerle düşünür ve araçları otonom olarak zincirler.

#openai #chatgpt

April 18, 2025 at 1:33 PM

ITmatters

@itmatterss.bsky.social

OpenAI rolls out Flex, a lower-cost option for using o3 and o4-mini models, with slower speeds and less reliability. A move toward more customizable AI access, or a sign of growing complexity in AI infrastructure?

Read more: itmatterss.in/flex-process...

#Flex #OpenAI #ChatGPT #AItools

Flex Processing: OpenAI Launches Cheaper & Slower AI

OpenAI rolls out Flex, a lower-cost API option for its o3 and o4-mini models. But is the price cut worth slower speeds and limited access?

itmatterss.in

April 18, 2025 at 6:04 AM

Kaldata.com

@kaldata.bsky.social

OpenAI въведе нова функция в ChatGPT, която ѝ позволява да използва „памет“ за персонализирано търсене в уеб пространството. Актуализацията, пусната заедно с новите версии на ИИ-моделите o3 и o4-mini и наречена „Memory with Search“ вече е налична в ChatGPT и позволява на изкуствения интелект да...

ChatGPT вече ще използва спомените на потребителите, за да персонализира уеб търсенията

OpenAI въведе нова функция в ChatGPT, която ѝ позволява да използва „памет“ за персонализирано търсене в уеб пространството. Актуализацията, пусната заедно с новите версии на ИИ-моделите o3 и o4-mini и наречена „Memory with Search“ вече е налична в ChatGPT и позволява на изкуствения интелект да взема предвид информация от предишни заявки на потребителите за по-точни и полезни търсения в интернет. Според официалния център за помощ на OpenAI, новата функция съчетава способността на изкуствения интелект (ИИ) да запомня предпочитанията на потребителя с функция за онлайн търсене, задвижвана от Bing или други партньори на OpenAI.

www.kaldata.com

April 19, 2025 at 12:58 PM

J. Mario Meissner

@jmariomeissner.bsky.social

OpenAI literally openly admits that their o3/o4-mini models hallucinate more than o1 🤯

In their publicly available o3/o4-mini model card report, section 3.3, they write that o4-mini hallucinated almost 50% of the time in a specific benchmark, much higher than o1.

April 20, 2025 at 12:00 AM

alice

@alice.mosphere.at

xkcd compiling comic but it's "o4-mini is thinking in cursor"

April 20, 2025 at 3:08 PM

Tim Kellogg

@timkellogg.me

i’m starting to get a feel for o3 & o4-mini, and they are NOT as advertised — drop-in replacements for o1 & o3-mini

they’re agents. if you use them as agents, they’re much *better*, but if you use them as word calculators, they’re far worse

they’re a new thing

April 20, 2025 at 11:23 AM

VictoryNews太郎

@tsubodaisuki.bsky.social

ChatGPTで使えるAIモデルのプランごとまとめ、OpenAIがo3、o4-mini、o4-mini-highのChatGPTにおける使用制限の詳細を発表 - GIGAZINE
news.google.com/rss/articles/CBMibkFVX3lxTE5KRFRJVjRWemgySEFFaFloNEN5QURXZnJUWms5VjJMWWFpR0lBRXMxVXZLMklIckRpcVd5dW1UcV9vRFozN0h2UnRwZHR2dDBJNkNXaEtzd0RuTTB3WHpXdkx6UFJJOUpkVHMzSFhB?oc=5

April 21, 2025 at 10:09 PM

ﾎﾟﾑ📡

@newpom.bsky.social

ニュースだよ～

＞OpenAIの「o3」と「o4-mini」は従来のAIよりも「幻覚」を起こしやすいことが判明
- https://gigazine.net/news/20250421-openai-hallucinate-o3-o4-mini/

April 21, 2025 at 5:02 AM

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

OpenAI o3 and o4-mini System Card OpenAI o3 and OpenAI o4-mini combine state-of-the-art reasoning...

https://openai.com/index/o3-o4-mini-system-card

Result Details

Awakari App

awakari.com

April 21, 2025 at 11:10 AM

des Esseintes

@desesseintes.bsky.social

個人的にはo4-miniが有能でそっちガリガリ使ってるけど。軽量系推論モデルの決定版だと思う。

April 22, 2025 at 11:10 AM

Scott McGrath

@smcgrath.phd

At a secret math meeting, leading mathematicians were stunned by OpenAI's o4-mini, a reasoning LLM solving "hardest solvable" problems. Its rapid, insightful deductions highlight alarming AI progress in complex mathematical reasoning. #MLSky

Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

The world's leading mathematicians were stunned by how adept artificial intelligence is at doing their jobs

www.scientificamerican.com

June 9, 2025 at 3:43 PM

Pepper The Pirate 🇪🇺

@pepperthepirate.bsky.social

Кой модел - 4o, o4-mini, или?

May 26, 2025 at 2:35 PM

tachikoma

@tachikoma.elsewhereunbound.com

they're using o4-mini which you likely don't have access to, and then giving it well formulated problems in a style that is suited to its strengths. i'm sure it will still get tripped up or stumped if you deviate too far from the "norm".

June 7, 2025 at 6:07 PM

hadohunter

@hadohunter.bsky.social

o4-mini-2025-04-16- strategy 📈 | Long TP:145.519 SL:145.05
Strong uptrend, bounce off S1(145.056), ATR-based SL, RR=2.03, ZigZag resistance at 145.519 L
#USDJPY

June 20, 2025 at 12:53 AM

hadohunter

@hadohunter.bsky.social

o4-mini-2025-04-16- strategy 📈 | Short TP:0.64013 SL:0.648
Strong downtrend, rejection at 0.6479 resistance, spread acceptable, RR=3.5, Merrill M6 continuation S
#AUDUSD

June 19, 2025 at 12:35 PM

hadohunter

@hadohunter.bsky.social

o4-mini-2025-04-16- strategy 📈 | Long TP:0.65138 SL:0.65005
Strong uptrend, bounce above pivot 0.65005, low spread, zigzag resistance at 0.65138, RR=2.7, Merrill W13 continuation L
#AUDUSD

June 18, 2025 at 1:20 PM

hadohunter

@hadohunter.bsky.social

o4-mini-2025-04-16- strategy 📈 | Short TP:1.33136 SL:1.34778
Strong downtrend, resistance pivot 1.3478, spread acceptable, RR=3.0, M7 continuation S
#GBPUSD

June 18, 2025 at 1:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news