Angie Boggust
angieboggust.bsky.social
Angie Boggust
@angieboggust.bsky.social
MIT PhD candidate in the VIS group working on interpretability and human-AI alignment
Abstraction Alignment reframes alignment around conceptual relationships, not just concepts.

It helps us audit models, datasets, and even human knowledge.

I'm excited to explore ways to 🏗 extract abstractions from models and 👥 align them to individual users' perspectives.
April 14, 2025 at 3:48 PM
Abstraction Alignment works on datasets too!

Medical experts analyzed clinical dataset abstractions, uncovering issues like overuse of unspecified diagnoses.

This mirrors real-world updates to medical abstractions — showing how models can help us rethink human knowledge.
April 14, 2025 at 3:48 PM
Language models often prefer specific answers even at the cost of performance.

But Abstraction Alignment reveals that the concepts an LM considers are often abstraction-aligned, even when it’s wrong.

This helps separate surface-level errors from deeper conceptual misalignment.
April 14, 2025 at 3:48 PM
And we packaged Abstraction Alignment and its metrics into an interactive interface so YOU can explore it!

🔗https://vis.mit.edu/abstraction-alignment/
April 14, 2025 at 3:48 PM
Aggregating Abstraction Alignment helps us understand a model’s global behavior.

We developed metrics to support this:
↔️ Abstraction match – most aligned concepts
💡 Concept co-confusion – frequently confused concepts
🗺️ Subgraph preference – preference for abstraction levels
April 14, 2025 at 3:48 PM
Abstraction Alignment compares model behavior to human abstractions.

By propagating the model's uncertainty through an abstraction graph, we can see how well it aligns with human knowledge.

E.g., confusing oaks🌳 with palms🌴 is more aligned than confusing oaks🌳 with sharks🦈.
April 14, 2025 at 3:48 PM
Interpretability identifies models' learned concepts (wheels 🛞).

But human reasoning is built on abstractions — relationships between concepts that help us generalize (wheels 🛞→ car 🚗).

To measure alignment, we must test if models learn human-like concepts AND abstractions.
April 14, 2025 at 3:48 PM
Hey Julian — thank you so much for putting this together! My research is on interpretability and I’d love to be added.
November 24, 2024 at 2:21 PM