Lightnews — Scholar-powered news

Pekka Lund

@pekka.bsky.social

If you think that launch was quiet, you should check their Reddit AMA about it.

It's now over and shows one answered question... So people are not happy.

(But it seems to be due to some technical issue. The team members have actually given many answers.)

From the OpenAI community on Reddit

Explore this post and more from the OpenAI community

www.reddit.com

November 13, 2025 at 11:34 PM

Pekka Lund

@pekka.bsky.social

In the near future:

- My horse is special!

Sure, it's the only one that's dead. Because you didn't dare to try to beat any of those superintelligent robot horses.

November 13, 2025 at 9:27 PM

Pekka Lund

@pekka.bsky.social

Gemini Live is now very much that. Even in Finnish, with which it has had some noticeable issues before.

E.g last week I asked it in Finnish what time is it. And it answered something like 17.004, which was weird. And with every plausible interpretation it was some minutes off.

Today: perfect

November 13, 2025 at 4:04 PM

Pekka Lund

@pekka.bsky.social

GPT-5.1 feels more like a response for this, as it's more about styles and personas.

Gemini Live might be already powered by 3.0 Flash? Would they make such a big update for the current generation if new is almost here?

Gemini Live model update makes conversations a lot more human-like

Gemini can now speak at your pace

www.androidpolice.com

November 13, 2025 at 3:47 PM

Pekka Lund

@pekka.bsky.social

Overtake is very likely, very soon. Google is also soon in pretty much every pocket (even if Siri branded in some), so consumers will get to know it. OpenAI may face real challenges if they fail to respond.

November 13, 2025 at 2:43 PM

Pekka Lund

@pekka.bsky.social

Apparently you are only allowed to do so with ships and stuff.

November 13, 2025 at 3:37 AM

Pekka Lund

@pekka.bsky.social

Might get suspended by Christmas.

Bluesky Safety @safety.bsky.app · 2d

CORRECTION: The post was made on Oct 10. Due to a backlog in cases, our mod team didn’t review and take action until Nov 11, which is when we engaged with the account owner about the reason for suspension.

November 13, 2025 at 3:30 AM

Pekka Lund

@pekka.bsky.social

It's also possible that some A/B tested models do not take all those settings into account in the same way. I recently also had an A/B test while my google grounding was on. Only one of the alternatives listed sources for some reason.

November 13, 2025 at 3:26 AM

Pekka Lund

@pekka.bsky.social

Figure 2 in that article also seems to reveal some interesting thing. It shows thinking budget has been limited to its minimum, which seems problematic. And grounding with google search is on.

November 13, 2025 at 3:25 AM

Pekka Lund

@pekka.bsky.social

The available resolution for that figure 5 document is probably too low for making sense of all the numbers and symbols.

But at least there's no question whether 2.5 already tries to perform calculations for checking if things fit, as it clearly states it's doing that in the reasoning summaries.

November 13, 2025 at 3:13 AM

Pekka Lund

@pekka.bsky.social

It's unfortunate those document images aren't available as then we could actually do some experimentation on how resolution etc. affects interpretations.

But I pasted the right side of the low quality version in figure 3 to Gemini, and isn't this quite close already?

November 13, 2025 at 2:58 AM

Pekka Lund

@pekka.bsky.social

That led to it immediately correcting its previous statements, without me explicitly asking it.

I basically just gave the higher quality image and said that maybe you didn't see this correctly before.

November 13, 2025 at 2:39 AM

Pekka Lund

@pekka.bsky.social

I e.g. once used Gemini to make sense of a PDF that used many small sub/superscript symbols. Some of it's answers didn't seem right, and it wasn't immediately obvious why. Then I figured it could be because it couldn't read those small symbols. So I provided a larger zoomed image of that part.

November 13, 2025 at 2:36 AM

Pekka Lund

@pekka.bsky.social

There's a note:

"If that ambiguous mark above the 1 tipped it off that the 145 was a measurement in pounds, the result was a similar process of logical deduction and self correction."

So being able to discern some small unclear symbol might make a big difference for subsequent reasoning.

November 13, 2025 at 2:34 AM

Pekka Lund

@pekka.bsky.social

I wouldn't be for example too surprised if the A/B testing wasn't 2.5 vs. 3.0 but 2.5 vs. 2.5 with improved visual processing.

November 13, 2025 at 2:28 AM

Pekka Lund

@pekka.bsky.social

Yes, we agree that the task itself requires plenty besides vision. I'm just saying that improvements from 2.5 to supposed 3.0 on that specific task could be largely due to improved image processing, instead of improved smarts, because 2.5 is already smart that way but has issues with vision.

November 13, 2025 at 2:27 AM

Pekka Lund

@pekka.bsky.social

It's not just vision but I'm saying the improvements could be mostly due to something like improvements in vision. Because the other stuff he seems to think 2.5 doesn't have it already has.

Look at e.g. the section under "Symbolic Reasoning and LLMs". It's Gary Marcus type of misinfo.

November 13, 2025 at 2:16 AM

Pekka Lund

@pekka.bsky.social

I have read it.

November 13, 2025 at 2:12 AM

Pekka Lund

@pekka.bsky.social

Gemini is already e.g. very good at interpreting mathematical formulas in PDF documents but sometime it fails to read small sub/superscripts correctly due to limited resolution (page images are scaled to maximum size).

I would think resolution can affect recognition of handwritten text quite a lot.

November 13, 2025 at 2:12 AM

Pekka Lund

@pekka.bsky.social

Sure, but I think that has also led him to believe this isn't yet true: "also showing signs of spontaneous, abstract, symbolic reasoning".

He says it's not just about vision but I think something like that could largely be just that. The smarts are there already, even if they will no doubt improve.

November 13, 2025 at 2:10 AM

Pekka Lund

@pekka.bsky.social

Author seems to have fundamental misunderstanding how LLMs function?

"Remember that LLMs are inherently predictive by nature, trained to choose the most probable way to complete a sequence like “the cat sat on the …”. They are, in effect, made up of tables which record those probabilities."

November 13, 2025 at 2:01 AM

Pekka Lund

@pekka.bsky.social

Meta can now stop spending resources on LeCun's JEPA dreams and focus on LLMs?

November 12, 2025 at 4:10 PM

Pekka Lund

@pekka.bsky.social

I believe you pretty much just described what feelings actually are in humans too.

Pekka Lund @pekka.bsky.social · Aug 23

Pain "feels" so bad because it's the worst thing that can happen to a signal processor: it interferes processing of all other signals. You can control visual input by turning your head or closing your eyes. But you can't control pain signals.

And those signals link to conceptually bad things.

November 12, 2025 at 2:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news