Veselin Raychev
veselinr.bsky.social
Veselin Raychev
@veselinr.bsky.social
INSAIT, Co-founder LogicStar AI, ex: Snyk Code, founder DeepCode, PhD @ ETH Zurich - ML for code, Google Maps directions
We evaluate Claude 3.7 with 64k thinking tokens on BaxBench, and find that it now tops our leaderboard with 38% correct and secure generation rate. But when instructing the models with security specifications OpenAI o1 is again the best model.
February 25, 2025 at 7:33 PM
LLMs are great at generating code, but the real test is creating production-ready applications. With BaxBench we tried to answer the question how often functionally correct app backends are generated and how often they contain security vulnerabilities.
BaxBench.com - led by @markvero.bsky.social
BaxBench: Can LLMs Generate Secure and Correct Backends?
We introduce a novel benchmark to evaluate LLMs on secure and correct code generation, showing that even flagship LLMs are not ready for coding automation, frequently generating insecure or incorrect ...
BaxBench.com
February 20, 2025 at 2:00 PM
Reposted by Veselin Raychev
How to effectively fix vulnerabilities in code.

1 have the scanner confirm if it is fixed. Not just LLM hallucinations
2 have a fast scanner that can be used in Delta debugging to check what lines are affecting the results
3 all working in the IDE speed

snyk.co/uhJ48
How does Snyk DCAIF Work under the hood? | Snyk
Read our technical deep-dive into how Snyk's DCAIF works. To start, with Snyk's Deep Code AI Fix, simply register for a Snyk account here, enable DeepCode AI Fix in your Snyk settings, and start relia...
snyk.co
November 22, 2024 at 8:56 AM
The new bggpt is here. Based on Gemma2. The large 27B model is on par with gpt4o with gpt4o used as a judge.

models.bggpt.ai/blog/
State-of-the-art Bulgarian LLMs
State-of-the-art generative AI created for the Bulgarian government, users, public and private organizations
models.bggpt.ai
November 19, 2024 at 12:38 PM
Our continuous pretraining method for LLMs that reduces forgetting from the base model was presented last week at EMNLP. Soon, some really strong models are coming.

arxiv.org/abs/2407.08699
November 18, 2024 at 7:37 AM