Veselin Raychev
veselinr.bsky.social
Veselin Raychev
@veselinr.bsky.social
INSAIT, Co-founder LogicStar AI, ex: Snyk Code, founder DeepCode, PhD @ ETH Zurich - ML for code, Google Maps directions
We evaluate Claude 3.7 with 64k thinking tokens on BaxBench, and find that it now tops our leaderboard with 38% correct and secure generation rate. But when instructing the models with security specifications OpenAI o1 is again the best model.
February 25, 2025 at 7:33 PM