Lightnews — Scholar-powered news

Angie Boggust

@angieboggust.bsky.social

420 followers 250 following 11 posts

MIT PhD candidate in the VIS group working on interpretability and human-AI alignment

Posts Replies Media Videos

Angie Boggust

@angieboggust.bsky.social

Abstraction Alignment works on datasets too!

Medical experts analyzed clinical dataset abstractions, uncovering issues like overuse of unspecified diagnoses.

This mirrors real-world updates to medical abstractions — showing how models can help us rethink human knowledge.

April 14, 2025 at 3:48 PM

Angie Boggust

@angieboggust.bsky.social

Language models often prefer specific answers even at the cost of performance.

But Abstraction Alignment reveals that the concepts an LM considers are often abstraction-aligned, even when it’s wrong.

This helps separate surface-level errors from deeper conceptual misalignment.

Two examples of Abstraction Alignment applied to a language model.

April 14, 2025 at 3:48 PM

Angie Boggust

@angieboggust.bsky.social

And we packaged Abstraction Alignment and its metrics into an interactive interface so YOU can explore it!

🔗https://vis.mit.edu/abstraction-alignment/

A screenshot of the Abstraction Alignment interface.

April 14, 2025 at 3:48 PM

Angie Boggust

@angieboggust.bsky.social

Abstraction Alignment compares model behavior to human abstractions.

By propagating the model's uncertainty through an abstraction graph, we can see how well it aligns with human knowledge.

E.g., confusing oaks🌳 with palms🌴 is more aligned than confusing oaks🌳 with sharks🦈.

April 14, 2025 at 3:48 PM

Angie Boggust

@angieboggust.bsky.social

Interpretability identifies models' learned concepts (wheels 🛞).

But human reasoning is built on abstractions — relationships between concepts that help us generalize (wheels 🛞→ car 🚗).

To measure alignment, we must test if models learn human-like concepts AND abstractions.

April 14, 2025 at 3:48 PM

Angie Boggust

@angieboggust.bsky.social

#CHI2025 paper on human–AI alignment!🧵

Models can learn the right concepts but still be wrong in how they relate them.

✨Abstraction Alignment✨evaluates whether models learn human-aligned conceptual relationships.

It reveals misalignments in LLMs💬 and medical datasets🏥.

🔗 arxiv.org/abs/2407.12543

An overview of Abstraction Alignment, including its authors and links to the paper, demo, and code.

April 14, 2025 at 3:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news