Angie Boggust
angieboggust.bsky.social
Angie Boggust
@angieboggust.bsky.social
MIT PhD candidate in the VIS group working on interpretability and human-AI alignment
Abstraction Alignment works on datasets too!

Medical experts analyzed clinical dataset abstractions, uncovering issues like overuse of unspecified diagnoses.

This mirrors real-world updates to medical abstractions — showing how models can help us rethink human knowledge.
April 14, 2025 at 3:48 PM
Abstraction Alignment compares model behavior to human abstractions.

By propagating the model's uncertainty through an abstraction graph, we can see how well it aligns with human knowledge.

E.g., confusing oaks🌳 with palms🌴 is more aligned than confusing oaks🌳 with sharks🦈.
April 14, 2025 at 3:48 PM
Interpretability identifies models' learned concepts (wheels 🛞).

But human reasoning is built on abstractions — relationships between concepts that help us generalize (wheels 🛞→ car 🚗).

To measure alignment, we must test if models learn human-like concepts AND abstractions.
April 14, 2025 at 3:48 PM