Lightnews — Scholar-powered news

Ben Edelman

@benedelman.bsky.social

Thinking about how/why AI works/doesn't, and how to make it go well for us.

Currently: AI Agent Security @ US AI Safety Institute

benjaminedelman.com

Posts Replies Media Videos

Ben Edelman

@benedelman.bsky.social

This is a big-tent workshop, welcoming many areas of ML. The emphasis is scientific progress, not SOTA—science that can be demonstrated on free-tier Colab. I'm looking forward to playing with and learning from the notebooks that appear in the workshop!

May 8, 2025 at 1:51 PM

Ben Edelman

@benedelman.bsky.social

What if there were a workshop dedicated to *small-scale*, *reproducible* experiments? What if this were at ICML 2025? What if your submission (due May 22nd) could literally be a Jupyter notebook?? Pretty excited this is happening. Spread the word! sites.google.com/view/moss202...

May 8, 2025 at 1:51 PM

Ben Edelman

@benedelman.bsky.social

6/ We also explored, among other questions, what happens when we measure pass@k attack success
rates, because real world attackers may be able to attempt attacks multiple times at little cost.

January 17, 2025 at 9:41 PM

Ben Edelman

@benedelman.bsky.social

5/ Here are results for several specific malicious tasks of varying harmfulness and complexity, including new scenarios we added to the framework (more details in the blog post on our improvements to AgentDojo):

January 17, 2025 at 9:41 PM

Ben Edelman

@benedelman.bsky.social

3/ To find out, we organized a red teaming exercise. The resulting attack is much more effective than the pre-packaged attacks. In a majority of cases, the agent follows the hijacker’s instructions:

January 17, 2025 at 9:41 PM

Ben Edelman

@benedelman.bsky.social

For years, this mysterious undulating loop has lived at the top of my personal homepage.

December 8, 2024 at 11:04 PM

Ben Edelman

@benedelman.bsky.social

My favorite "ordinary life" example of this notion of singular limits: (from mecheng.iisc.ac.in/lamfip/me304...)

December 7, 2024 at 2:43 PM

Ben Edelman

@benedelman.bsky.social

I'll end this thread with the parable that opens the dissertation (my conference will require a parable section in every submission). Tag yourself :)

December 2, 2024 at 12:21 AM

Ben Edelman

@benedelman.bsky.social

The bulk of the thesis is a series of case studies from my research. But first, in Chapter 3 ("Deep Learning Preliminaries") I try to define some terms from first principles—above these footnotes, you can find my idiosyncratic definition of neural nets in terms of arithmetic circuits.

December 2, 2024 at 12:21 AM

Ben Edelman

@benedelman.bsky.social

2. Transferability: insights learned from the system need to transfer to settings of interest. This can happen because of *low-level* commonalities (think cell cultures) or *high-level* commonalities (think macroeconomic models).

December 2, 2024 at 12:21 AM

Ben Edelman

@benedelman.bsky.social

...Specifically, two conditions I propose in the thesis:
1. Productivity: A model system needs to be exceptionally fertile ground for producing scientific insights.

December 2, 2024 at 12:21 AM

Ben Edelman

@benedelman.bsky.social

It's a tribute to a kind of science I love (and reviews sometimes hate), where in order to understand a complicated system (e.g. training a transformer on internet text), you instead study a different system (e.g. training an MLP to solve parity problems).

December 2, 2024 at 12:21 AM

Ben Edelman

@benedelman.bsky.social

I defended my PhD dissertation back in May. I didn't have time to share it widely then (newborn baby), but I think some of you might enjoy it, especially the opening chapters: benjaminedelman.com/assets/disse...

December 2, 2024 at 12:21 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news