o3 ignored 1/3 of the requirements, and was very stubborn in admitting it's mistake when pointed on it
o3 ignored 1/3 of the requirements, and was very stubborn in admitting it's mistake when pointed on it
Maybe not Tesco
Maybe not Tesco
And frankly, it feels like the 3.5 was better before the release of 3.7. Almost like the situation with the very first gpt4, that was/felt initially way smarter/capable than its iterations
And frankly, it feels like the 3.5 was better before the release of 3.7. Almost like the situation with the very first gpt4, that was/felt initially way smarter/capable than its iterations