mattorb.bsky.social
@mattorb.bsky.social
Stay curious.

https://mattorb.com
That’s a fun example for MCP! Maybe let people text it (?) 😀
March 17, 2025 at 12:25 AM
note: was pointed at www.swebench.com
SWE-bench
SWE-bench: Evaluate Language Models on Open Source Software Tasks
www.swebench.com
March 16, 2025 at 11:18 PM
Whether that can be applied in anyone else's configuration... well, I'm looking for a reliable source of information that is not just one or two people's story, and coming up empty.
March 16, 2025 at 2:11 PM
Everyone seems to have a story of 'I found x works better', which was probably true for them given a particular configuration (model, parameters, system prompt, targeted codebase language/size/shape, context size, assistant tool).
March 16, 2025 at 2:10 PM
Does it recording screen recording permission?

That is one of my current nits with the bartender app that does something similar.
February 6, 2025 at 11:45 PM
Like any metric, it can be gamed... but this particular one is _even more opaque_ than most, since contributions to a private repo are shown in that graph and you can't drill into the detail.

I mean.... this one looks real, right?
February 6, 2025 at 2:22 PM
Note: icon is from their media kit which can found here: drive.google.com/drive/folder...
Bluesky Media Kit - Google Drive
drive.google.com
February 6, 2025 at 1:45 PM