Hersh Gupta
banner
hershgupta.com
Hersh Gupta
@hershgupta.com
Applied Scientist, Responsible AI @BCGX | @bostonu.bsky.social alum | Data, AI, and strategy enthusiast | Open-source contributor

Opinions are my own

#bikeboston #coys

📍DC -> BOS
Many takes on the quoted article are disingenuously incorrect, but for those who care about coding skills, a simple fix (recommended by Boris himself):

/config > Preferred output style

I like "Explanatory", but "Learning" is good too
February 1, 2026 at 3:38 PM
One thing (of many) that amazes me about this is how useful the agent skills paradigm can be in niche applications. Case in point: Claude Code planned waypoints on Mars for NASA’s Perseverance rover in their highly custom Rover Markup Language.

www.anthropic.com/features/cla...
February 1, 2026 at 2:07 PM
Reposted by Hersh Gupta
You can vibe code your way to a working prototype. You cannot vibe code or one-shot your way to a competitive product that works at scale. The hard part isn't writing code; it's the architectural supervision.
January 29, 2026 at 7:34 PM
Many businesses are opting to implement AI agents in customer service functions not because it’s where there’s greatest value or where they’ll likely see the greatest cost savings (neither of which are true), but purely because AI “agents” and customer service “agents” are synonymous.
January 31, 2026 at 7:04 PM
Reposted by Hersh Gupta
This is not true; I beg people read the full paper and especially the study design.

The conclusions mirror my (and many other practitioners') conclusions: if you use AI critically and engage both with the question and the answer, it has a net positive impact on both learning and productivity
January 31, 2026 at 9:38 AM
Reposted by Hersh Gupta
moltbook asks the important question: what if we created the alignment researchers' worst nightmare?
January 30, 2026 at 5:21 PM
it’s getting existential on the agent social media network
January 30, 2026 at 5:25 AM
“Why would anyone use a coding agent from a CLI when I have cursor?” a lead engineer asked me last year when I suggested he try Claude Code. Now it’s all he uses
Talking to lots of folks across various professions using Claude Code, it is genuinely surprising, even to me, how much of a leap they have seen in the ability of LLMs to do real work in the last six weeks.

I know people on this site are sometimes skeptical of AI, but worth paying attention to.
January 30, 2026 at 5:15 AM
*another* ai thing called “genie”? I know LLMs lead to homogenization of creative diversity, but come on
January 29, 2026 at 6:40 PM
Trying to convert more people from using GPTs and copilots to instead start using skills, but the cognitive barriers are weirdly high?

agentskills.io/home
Overview - Agent Skills
A simple, open format for giving agents new capabilities and expertise.
agentskills.io
January 29, 2026 at 1:56 AM
It’s strange that the DoD’s Developmental Test and Evaluation of Artificial Intelligence-Enabled Systems Guidebook is almost entirely focused on ML evaluation, with little to no info on evaluating GenAI systems
As the spending on frontier artificial intelligence capabilities for defense and intelligence increases, Matteo Pistillo explores how the Defense Department and the intelligence community should strengthen AI testing to prevent internal security threats from arising.
Keep AI Testing Defense-Worthy
In defense and intelligence, AI testing and evaluation should adapt to prevent national security threats arising from AI misalignment.
www.lawfaremedia.org
January 27, 2026 at 12:53 AM
Reposted by Hersh Gupta
this is completely insane and more people need to say that doing this is completely insane
January 26, 2026 at 5:27 PM
I’m curious what the next few months are gonna look like when Claude Cowork goes mainstream. We’re already getting peeks at this with Claude in Office, but not sure if most office workers are ready for the transition.
January 26, 2026 at 2:18 AM
Reposted by Hersh Gupta
don't really want to argue about the ethics of genAI overall right at the moment but if you're giving OpenAI in particular your money please consider literally any other option
The largest Trump superPAC donor so far this cycle is the president of OpenAI
January 26, 2026 at 12:58 AM
Running Claude Code locally with gpt-oss via ollama feels illegal

docs.ollama.com/integrations...
January 25, 2026 at 9:35 PM
Reposted by Hersh Gupta
The biggest lesson I’ve learned from brief stints studying/living abroad: It’s okay for the government to do good things. Some people won’t like it. That’s ok. It’s still our collective responsible to do good things. This lesson is almost impossible to learn in the U.S. today.
January 23, 2026 at 8:34 PM
As someone who spent too much time debugging ROCm bc AMD barely makes it useable, I’m glad for this (if it works)
January 23, 2026 at 10:19 PM
it’s unearned valor on sounding like AI when you grew up reading old books
Everyone is starting to sound like AI, even in spoken language

Analysis of 280,000 transcripts of videos of talks & presentations from academic channels finds they increasingly used words that are favorites of ChatGPT

Model collapse, except for humans arxiv.org/pdf/2409.017...
January 22, 2026 at 11:55 PM
it’s a great guide for product quality and some evals are better than no evals, but safety, security, and fairness should be more regularly included in eval harnesses

www.anthropic.com/engineering/...
Demystifying evals for AI agents
Demystifying evals for AI agents
www.anthropic.com
January 22, 2026 at 11:13 PM
Reposted by Hersh Gupta
me, a social scientist, trying to continually convince software teams that they have social science measurement challenges, seeing this title 😱😱😱😱😱😱😱😱😱
January 22, 2026 at 4:53 PM
despite the numerous anti-ai accounts on here, bluesky is still where I go to hide from the bad ai discourse on linkedin
January 11, 2026 at 10:18 PM
Reposted by Hersh Gupta
A bridge builder doesn’t need to denounce every evil in the world to be moral, but they better say something about the guy who keeps building bridges that topple over
January 8, 2026 at 5:23 AM
strange how laypeople think the “autonomous decision-making” of AI agents only applies to substantive and material real-world decisions and not like, deciding to make basic tool calls like querying a database
January 8, 2026 at 10:16 PM
A @promptfoo.bsky.social + @credoai.bsky.social integration would solve so many AI auditability and risk measurement challenges. Is anyone building something similar?
January 6, 2026 at 8:43 PM
even with the rise of fully-autonomous coding agents like Codex and Claude Code, it’s important to keep a close eye on them, case in point: CC almost committed my API keys in plaintext
January 5, 2026 at 5:55 PM