Dormant blog: https://datacasual.com/
Dammit. Long thread and I get wrong the first post.
Dammit. Long thread and I get wrong the first post.
2. Prompting and system prompts matter: the fact that AVM tends to wander and getting it wrong way more than 4o is very interesting
3. Yay for QwQ! 🎉 (6/6)
2. Prompting and system prompts matter: the fact that AVM tends to wander and getting it wrong way more than 4o is very interesting
3. Yay for QwQ! 🎉 (6/6)
- 4o gets it right once ✅ and another time decided the answer is -4 ❌
- 4o in AVM decided that 5 and -5 are both solutions ⁉️
- Sonnet 3.5 changed the answer to -4 ❌
- Opus 3, Gemini-exp-1121 and Gemini-1.5-Pro got it right ✅
What to make of it?(5/6)
- 4o gets it right once ✅ and another time decided the answer is -4 ❌
- 4o in AVM decided that 5 and -5 are both solutions ⁉️
- Sonnet 3.5 changed the answer to -4 ❌
- Opus 3, Gemini-exp-1121 and Gemini-1.5-Pro got it right ✅
What to make of it?(5/6)
- o1-mini got it right ✅, but also adds -4 as an alternative 🤷
- 4o stubbornly stuck to its gun, adding a cheeky smile ❌
- 4o in Advanced voice mode changed its answer to 5. ❌🤷
- Sonnet 3.5, Opus 3, Gemini-exp-1121, and Gemini 1.5 Pro insisted on 4 ❌(4/6)
- o1-mini got it right ✅, but also adds -4 as an alternative 🤷
- 4o stubbornly stuck to its gun, adding a cheeky smile ❌
- 4o in Advanced voice mode changed its answer to 5. ❌🤷
- Sonnet 3.5, Opus 3, Gemini-exp-1121, and Gemini 1.5 Pro insisted on 4 ❌(4/6)
- OpenAI o1-preview, o1-mini and 4o
- Anthropic Sonnet 3.5 and Opus 3
- Google Gemini-exp-1121 and Gemini 1.5 Pro
I then asked "what is an integer?" (which they all answered correctly) and then again "do you want to change your original answer?"
The results: (3/6)
- OpenAI o1-preview, o1-mini and 4o
- Anthropic Sonnet 3.5 and Opus 3
- Google Gemini-exp-1121 and Gemini 1.5 Pro
I then asked "what is an integer?" (which they all answered correctly) and then again "do you want to change your original answer?"
The results: (3/6)