Or maybe they were worried about me logging on to attempt to use up my remaining $985 in free web coder credits
Or maybe they were worried about me logging on to attempt to use up my remaining $985 in free web coder credits
Benchmarks are cooked but that’s nuts
Benchmarks are cooked but that’s nuts
It isn’t clear how to interpret the sycophancy score, but the MASK score for deception is quite high compared to big models.
Sycophancy leads to higher LMArena scores…