Lightnews — Scholar-powered news

KS

@kazzorr.bsky.social

The so-called "reasoning models" like O3, Gemini 2.5 pro and Opus 4 do not do this, especially if you write in the system prompt not to be unnecessarily agreeable and sycophantic and to be critical instead (as I do) . They become far more useful this way.

July 9, 2025 at 5:10 PM

KS

@kazzorr.bsky.social

More useful for people with technical expertise in their fields and have strong error checking mechanisms. They can accelerate work flows substantially; e.g. frontier models, like O3 and Claude Opus 4 can be used as search tools across (say) different physics disciplines and help brainstorming.

July 9, 2025 at 5:03 PM

KS

@kazzorr.bsky.social

This I totally understand. Honestly this is a solution to a problem that nobody was asking for.

June 27, 2025 at 9:22 PM

KS

@kazzorr.bsky.social

In the technical areas like software engineering and scientific research across a multitude of fields interacting with quite a few researchers, I get the opposite reaction. Especially when talking about frontier models like chatGPT O3, Claude Sonnet & Opus 4 and Gemini 2.5 pro (None are free).

June 27, 2025 at 8:39 PM

KS

@kazzorr.bsky.social

it's not a bubble, though. These are already extremely useful tools. Even if the companies shut down because of lack of monetization, there are open source versions which are still very useful for many tasks (e.g. coding and debugging, some level of math, first pass semantic search of papers)

June 10, 2025 at 7:38 PM

KS

@kazzorr.bsky.social

Not to mention cultural adaptions which can take years. Even the UK which seems closest culturally to the US, requires work. France and Germany are a lot harder with linguistic barriers and divergent work culture, a lot greater than you might expect. A country like Sweden can feel worlds apart.

May 7, 2025 at 11:44 AM

KS

@kazzorr.bsky.social

I am curious as to which model people refer to when they say "chatGPT". Is this 4o (most likely)? 4.5? Because both these are LLMs (next word predictors) and make up stuff even with search. Frontier "reasoning" models like (paid version only) O3 or O4-mini-high (with search) are vastly better.

May 2, 2025 at 12:03 PM

KS

@kazzorr.bsky.social

Loved Castaways, which I finished reading last week! O3 as part of the same search above gave me a reading guide (perhaps copied from a single source somewhere). Haven't read the others but plan to check them out.

May 2, 2025 at 11:10 AM

KS

@kazzorr.bsky.social

I was curious about this, so I asked chatGPT O3 (it's best frontier model) with Search enabled. I had a single question, "What can you tell me about author Craig Schaefer?" It gave me multiple pages that had bio, books and reading guide. Full link chatgpt.com/share/6814a6.... Snippets attached.

May 2, 2025 at 11:05 AM

KS

@kazzorr.bsky.social

True. But Ann says, "don't ever use then for information searches ever" which is I think not sound advice because they function as semantic searches. The Deep Research options in O3 and 2.5-pro are enormously useful *and* pretty accurate, as tested in topics I already know well.

May 2, 2025 at 9:24 AM

KS

@kazzorr.bsky.social

It's reasonable to criticize or lament the rise of these models, but I'm confused that so many people here think they are dumb statistical noise generators or some such. Anyone who spends 30 mins with frontier "reasoning" models is going to have their mind blown at how useful & powerful they are.

May 2, 2025 at 5:54 AM

KS

@kazzorr.bsky.social

Don't think this is true at all for the frontier "reasoning" models (chatGPT O3, claude 3.7 sonnet or Gemini 2.5 pro). I can only speak for tasks in math, physics, coding, and some areas of engineering. Looking at their "thinking" process is, extremely illuminating and a little shocking.

May 2, 2025 at 5:41 AM

KS

@kazzorr.bsky.social

Huge fan of Ann Leckie and her writing. But she, unfortunately and rather disappointingly, misses the mark here. I use the frontier models (Gemini 2.5 pro and chatGPT O3) for engineering, coding, and searching scientific literature, not writing. It's hard to explain how good these are now.

May 2, 2025 at 5:30 AM

KS

@kazzorr.bsky.social

Very true for LLMs only a few months ago. But the recent class of models (which employ search directly) are surprisingly good at getting the right citations. I ve been testing Gemini 2.5 pro w search for my work ( a so-called "reasoning" model), and I rarely see errors anymore. Shocking, honestly.

April 30, 2025 at 3:34 AM

KS

@kazzorr.bsky.social

I'm confused as to why there is so much AI skepticism here. It is reasonable to ponder harms at a societal/ economic level as to the implications of widespread AI use . But skepticism? means you haven't used the frontier models for your work. They are astounding. O3, 3.7 sonnet(thinking), 2.5 pro..

April 30, 2025 at 12:39 AM

KS

@kazzorr.bsky.social

Love LA but doing anything in this city requires a plan..and a car. There are compensations, however - the beaches, the mountains all around and 9-10 months of spring/summer to enjoy them.

January 5, 2025 at 3:33 AM

KS

@kazzorr.bsky.social

Academics on H1Bs are on a parallel track and actually exempt from the lottery and other caps as the demand is vastly lower. Basically none of the current debate applies to them.

January 3, 2025 at 1:32 PM

KS

@kazzorr.bsky.social

But, the recent Google models do not suffer from this issue. Also, the most insane thing is the 25% on the Frontier Math benchmark, which is not public and evidently full of IMO level problems (per Terrance Tao).

December 21, 2024 at 4:35 PM

KS

@kazzorr.bsky.social

Hallelujah is a beautifully written and composed song, which I love to sing and play, but how it became a Christmas song is a huge mystery to me. It repeatedly re-interprets the word Hallelujah in multiple contexts from lust to faith to doubt and has pretty complex imagery.

December 20, 2024 at 12:06 PM

KS

@kazzorr.bsky.social

I didn't provide any prompts or clues, and it doesn't have access to the internet. Because our skeets barely have mentions, so wouldn't show up in search. Most likely, the free version is being throttled through caching. Again, these models should fail on these tasks. This is a bad test for them.

December 12, 2024 at 12:19 PM

KS

@kazzorr.bsky.social

I got a different answer. No idea if its correct, but I don't know why it gave a different answer for you (i have the paid version). Again, these models should not be good to this; this is like a blind spot by design. I'm actually surprised that Sonnet is managing to do this at all.

A screenshot of the AI model Claude Sonnet 3.5 identifying all Beatles' songs titles with the letter 'x'.

December 12, 2024 at 8:28 AM

KS

@kazzorr.bsky.social

The spelling errors are a fundamental issue related to the specifics of the tokenizer. Claude Sonnet, for example, does seem to get this right. People dismissing these bots due to these strange errors while being confident are possibly misjudging their capabilities.

December 11, 2024 at 6:59 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news