Could probably do some interesting evaluation workflow where you perform the steps in the video, but at the end compare 3.5 + reasoning to deepseek's final output
Could probably do some interesting evaluation workflow where you perform the steps in the video, but at the end compare 3.5 + reasoning to deepseek's final output