sshell
banner
sshell.co
sshell
@sshell.co
propane and propane accessories
ai + security research
ccdc red team
hahaha, never thought about it but post should have come with a warning for anyone with service indicator-related ptsd.

also, being at nationals back-to-back years is impressive!
August 3, 2025 at 6:44 PM
Yup, same result set across all tests! A lot of it was deduplicating requests, removing feature bloat, smart tuning based on internet speeds, and being much more efficient with memory.
December 19, 2024 at 8:13 PM
Note: this is as much an indictment of default settings on tools as it is of feature bloat. Even painstaking optimization of the original tool didn't approach these numbers.
December 19, 2024 at 6:53 PM
truly believe pompeii/herculaneum graffiti should be required reading in school to really emphasize this point
December 2, 2024 at 3:24 PM
yeah, even the best models in general are pretty fragile with wording when it comes to tool use.

scale.com/leaderboard/...
Tool Use | Scale Leaderboards
Explore ToolComp, Scale AI's SEAL leaderboard evaluating large language model agents on their ability to plan, reason, and orchestrate complex, dependent tool calls. Discover the latest results and in...
scale.com
November 30, 2024 at 8:58 PM