gist.github.com/herval/e341d...
gist.github.com/herval/e341d...
- Claude got it right every time
- GPT got the response wrong
- tinyllama couldn't figure out how to call the function at all
- mistral, qwen2.5, llama3.3 and deepseek-r1 hallucinated functions that don't exist
- llama3.2 get in an infinite loop, calling the same function forever
- Claude got it right every time
- GPT got the response wrong
- tinyllama couldn't figure out how to call the function at all
- mistral, qwen2.5, llama3.3 and deepseek-r1 hallucinated functions that don't exist
- llama3.2 get in an infinite loop, calling the same function forever
The test is simple:
- there's a function for listing files in a directory
- the question is simply how many files exist in the current folder + its parent
The test is simple:
- there's a function for listing files in a directory
- the question is simply how many files exist in the current folder + its parent