James Suleiman
James Suleiman
@suleiman.bsky.social
In my example I posted in my reply, most of the models were relatively quick (under a minute). o3 took about 7 minutes.
April 17, 2025 at 1:51 PM
Pre o3 release, the results were:

model, wrong titles, hallucinated titles, wrong authors
ChatGPT 4.5, 2, 1, 6
Gemini 2.5 Pro, 1, 0, 2
Claude 3.7 Sonnet, 5, 4, 9
Grok 3 thinking, 5, 3, 8
April 17, 2025 at 1:43 PM
The prompt was:

This is a shelf of books. Please analyze the image for books and create a csv file in the format:

book_title, author, edition (if applicable)
<insert row>...
April 17, 2025 at 1:43 PM
One area where I anecdotally see significant improvement with o3 is image recognition. I used the image you see here in class on Tuesday (pre o3). All of the models had issues but overall Gemini 2.5 did best.

Retested using o3 and it nailed it. Every title, author, and edition was perfect.
April 17, 2025 at 1:42 PM