Projects: dylancastillo.co/projects
It's a quick test designed to assess your estimation skills: estimator.dylancastillo.co/
This is inspired by @codinghorror's great posts: blog.codinghorror.com/how-good-an...
archive.is/qDc0v
It's a quick test designed to assess your estimation skills: estimator.dylancastillo.co/
This is inspired by @codinghorror's great posts: blog.codinghorror.com/how-good-an...
archive.is/qDc0v
After a bit of digging, I realized that it was just due to people misspelling "DeepSeek."
There are now people out there who think that China's top AI is a 💩 that makes charts.
After a bit of digging, I realized that it was just due to people misspelling "DeepSeek."
There are now people out there who think that China's top AI is a 💩 that makes charts.
Sounds easy, but happens to everyone.
Here's OpenAI breaking the CoT reasoning of an LLM judge.
Sounds easy, but happens to everyone.
Here's OpenAI breaking the CoT reasoning of an LLM judge.
But JSON-Schema performed worse than NL in 5 out of 6 tasks in my tests. Plus, in Shuffled Objects, it did so with a huge delta: 97.15% for NL vs. 86.18% for JSON-Schema.
But JSON-Schema performed worse than NL in 5 out of 6 tasks in my tests. Plus, in Shuffled Objects, it did so with a huge delta: 97.15% for NL vs. 86.18% for JSON-Schema.
Tweaked the prompts and improved all LMSF metrics except for NL in GSM8k.
GSM8k and Last Letter looked as expected (no diff).
But in Shuffled Obj. unstructured outputs clearly surpassed structured ones.
Tweaked the prompts and improved all LMSF metrics except for NL in GSM8k.
GSM8k and Last Letter looked as expected (no diff).
But in Shuffled Obj. unstructured outputs clearly surpassed structured ones.
I was able to reproduce the results and, after tweaking a few minor prompt issues, achieved a slight improvement in most metrics.
I was able to reproduce the results and, after tweaking a few minor prompt issues, achieved a slight improvement in most metrics.
I replicated @willkurt.bsky.social / @dottxtai.bsky.social rebuttal of Let Me Speak Freely? (LMSF) using gpt-4o-mini
The rebuttal correctly highlights many flaws with the original study, but ironically, LMSF's conclusion still holds
I replicated @willkurt.bsky.social / @dottxtai.bsky.social rebuttal of Let Me Speak Freely? (LMSF) using gpt-4o-mini
The rebuttal correctly highlights many flaws with the original study, but ironically, LMSF's conclusion still holds