📄Paper: arxiv.org/abs/2407.12543
💻Demo: vis.mit.edu/abstraction-...
🎥Video: www.youtube.com/watch?v=cLi9...
🔗Project: vis.mit.edu/pubs/abstrac...
With Hyemin (Helen) Bang, @henstr.bsky.social, and @arvind.bsky.social
📄Paper: arxiv.org/abs/2407.12543
💻Demo: vis.mit.edu/abstraction-...
🎥Video: www.youtube.com/watch?v=cLi9...
🔗Project: vis.mit.edu/pubs/abstrac...
With Hyemin (Helen) Bang, @henstr.bsky.social, and @arvind.bsky.social
It helps us audit models, datasets, and even human knowledge.
I'm excited to explore ways to 🏗 extract abstractions from models and 👥 align them to individual users' perspectives.
It helps us audit models, datasets, and even human knowledge.
I'm excited to explore ways to 🏗 extract abstractions from models and 👥 align them to individual users' perspectives.
Medical experts analyzed clinical dataset abstractions, uncovering issues like overuse of unspecified diagnoses.
This mirrors real-world updates to medical abstractions — showing how models can help us rethink human knowledge.
Medical experts analyzed clinical dataset abstractions, uncovering issues like overuse of unspecified diagnoses.
This mirrors real-world updates to medical abstractions — showing how models can help us rethink human knowledge.
But Abstraction Alignment reveals that the concepts an LM considers are often abstraction-aligned, even when it’s wrong.
This helps separate surface-level errors from deeper conceptual misalignment.
But Abstraction Alignment reveals that the concepts an LM considers are often abstraction-aligned, even when it’s wrong.
This helps separate surface-level errors from deeper conceptual misalignment.
🔗https://vis.mit.edu/abstraction-alignment/
🔗https://vis.mit.edu/abstraction-alignment/
We developed metrics to support this:
↔️ Abstraction match – most aligned concepts
💡 Concept co-confusion – frequently confused concepts
🗺️ Subgraph preference – preference for abstraction levels
We developed metrics to support this:
↔️ Abstraction match – most aligned concepts
💡 Concept co-confusion – frequently confused concepts
🗺️ Subgraph preference – preference for abstraction levels
By propagating the model's uncertainty through an abstraction graph, we can see how well it aligns with human knowledge.
E.g., confusing oaks🌳 with palms🌴 is more aligned than confusing oaks🌳 with sharks🦈.
By propagating the model's uncertainty through an abstraction graph, we can see how well it aligns with human knowledge.
E.g., confusing oaks🌳 with palms🌴 is more aligned than confusing oaks🌳 with sharks🦈.
But human reasoning is built on abstractions — relationships between concepts that help us generalize (wheels 🛞→ car 🚗).
To measure alignment, we must test if models learn human-like concepts AND abstractions.
But human reasoning is built on abstractions — relationships between concepts that help us generalize (wheels 🛞→ car 🚗).
To measure alignment, we must test if models learn human-like concepts AND abstractions.