Lightnews — Scholar-powered news

Hillary Sanders

@meatlearner.bsky.social

Machine-learner, meat-learner, research scientist, AI Safety thinker. Model trainer, skeptical adorer of statistics.

Co-author of: Malware Data Science

Posts Replies Media Videos

Hillary Sanders

@meatlearner.bsky.social

I went on the Code[ish] podcast to talk about AI, LLMs, and building Heroku's Managed Inference & Agents platform:
🎧 www.heroku.com/podcasts/cod...

The Development Basics of Managed Inference and Agents | Heroku

Join Heroku superfan Jon Dodson and Hillary Sanders from the Heroku AI Team for the latest entry in our “Deeply Technical” series. In this episode, the pair discuss Heroku Managed Inference and Agents...

www.heroku.com

July 2, 2025 at 6:13 PM

Hillary Sanders

@meatlearner.bsky.social

Here is a recording of my live demo at PyCon US 2025 on building scalable AI tool servers using the Model Context Protocol (MCP) and Heroku

www.youtube.com/watch?v=01I4...

Building Scalable AI Tool Servers with Model Context Protocol (MCP) and Heroku (Sponsor: Heroku)

YouTube video by PyCon US

www.youtube.com

May 29, 2025 at 4:50 PM

Hillary Sanders

@meatlearner.bsky.social

In honor of MLK day, here's super interesting essay my partner wrote on Martin Luther King Jr: what he actually believed and accomplished (different than what is sometimes described).

docs.google.com/document/d/1...

Incredibly impressive person.

docs.google.com

January 21, 2025 at 3:49 AM

Hillary Sanders

@meatlearner.bsky.social

Sleeper Agents
arxiv.org/pdf/2401.05566

So many AI safety issues get worse, & harder to combat the larger and more advanced your model gets:

"The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning"

arxiv.org

December 3, 2024 at 9:46 PM

Hillary Sanders

@meatlearner.bsky.social

Anthropic's "Towards Sycophancy In Language Models" arxiv.org/pdf/2310.13548

TLDR: LLMs tend to generate sycophantic responses.
Human feedback & preference models encourage this behavior.

I also think this is just the nature of training on internet writing.... We write in social clusters:

arxiv.org

December 3, 2024 at 9:00 PM

Hillary Sanders

@meatlearner.bsky.social

In AI safety, we have inner misalignment (actions don't minimize the loss function) and outer misalignment (loss function is misspecified).

But I do think that inner misalignment (~learned features) tend to act as a protective mechanism to avoid outer misalignment implications.

I, er, really hope.

November 23, 2024 at 3:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news