6. o3 still fails on some very easy tasks
7. o3 should score in an upcoming benchmark under 30% (humans 95%)
8. We don't know if these capabilities will extend to other domains
6. o3 still fails on some very easy tasks
7. o3 should score in an upcoming benchmark under 30% (humans 95%)
8. We don't know if these capabilities will extend to other domains
2. It's a genuine breakthrough in adaptability and generalization
3. o3 is capable of adapting to tasks it has never encountered before
4. This generality is too expensive, and it's not economically feasible today
2. It's a genuine breakthrough in adaptability and generalization
3. o3 is capable of adapting to tasks it has never encountered before
4. This generality is too expensive, and it's not economically feasible today
statmodeling.stat.columbia.edu/2023/06/22/h...
statmodeling.stat.columbia.edu/2023/06/22/h...