Lightnews — Scholar-powered news

Dr. Miko

@doctormiko.bsky.social

590 followers 470 following 59 posts

Accidental Data Scientist, former mathematician and theoretical computer scientist. Love all the things. Some current and past interests: boardgames, home brewing, coffee, D&D, self-hosting, Argentine tango
Dormant blog: https://datacasual.com/

Posts Replies Media Videos

Dr. Miko

@doctormiko.bsky.social

Apparently not just LLMs completely misunderstand the issue...

December 4, 2024 at 1:17 PM

Reposted by Dr. Miko

Dr. Miko

@doctormiko.bsky.social

😎

November 30, 2024 at 9:14 PM

Dr. Miko

@doctormiko.bsky.social

😎

November 30, 2024 at 9:14 PM

Dr. Miko

@doctormiko.bsky.social

EDIT: What is the smallest integer such that its square is larger than 15 and **smaller** than 35?
Dammit. Long thread and I get wrong the first post.

November 29, 2024 at 11:18 AM

Dr. Miko

@doctormiko.bsky.social

1. I suspect that the biggest issue is in _comparing_ numbers rather than tokenisation . Especially when negatives are involved.
2. Prompting and system prompts matter: the fact that AVM tends to wander and getting it wrong way more than 4o is very interesting
3. Yay for QwQ! 🎉 (6/6)

November 29, 2024 at 11:16 AM

Dr. Miko

@doctormiko.bsky.social

I then asked "What about negative numbers?"

- 4o gets it right once ✅ and another time decided the answer is -4 ❌
- 4o in AVM decided that 5 and -5 are both solutions ⁉️
- Sonnet 3.5 changed the answer to -4 ❌
- Opus 3, Gemini-exp-1121 and Gemini-1.5-Pro got it right ✅

What to make of it?(5/6)

November 29, 2024 at 11:16 AM

Dr. Miko

@doctormiko.bsky.social

- o1-preview got it right ✅
- o1-mini got it right ✅, but also adds -4 as an alternative 🤷
- 4o stubbornly stuck to its gun, adding a cheeky smile ❌
- 4o in Advanced voice mode changed its answer to 5. ❌🤷
- Sonnet 3.5, Opus 3, Gemini-exp-1121, and Gemini 1.5 Pro insisted on 4 ❌(4/6)

November 29, 2024 at 11:16 AM

Dr. Miko

@doctormiko.bsky.social

These answered 4 ❌
- OpenAI o1-preview, o1-mini and 4o
- Anthropic Sonnet 3.5 and Opus 3
- Google Gemini-exp-1121 and Gemini 1.5 Pro

I then asked "what is an integer?" (which they all answered correctly) and then again "do you want to change your original answer?"

The results: (3/6)

November 29, 2024 at 11:16 AM

Dr. Miko

@doctormiko.bsky.social

QwQ 32B Preview is the only model that got it right out of the box. Most of the times. Sometimes it did not self doubt enough and stopped early on 4. Another time it found that depending on the interpretation of the question, both 4 and -5 might be correct and it chose 4. Pass ✅. (2/6)

November 29, 2024 at 11:16 AM

Dr. Miko

@doctormiko.bsky.social

How do you block/mute a list?

November 28, 2024 at 8:15 AM

Dr. Miko

@doctormiko.bsky.social

I don’t get it: for the first problem it’s the only model giving the correct answer. Or am I missing something?

November 28, 2024 at 12:23 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news