Donatella Genovese
donatellag.bsky.social
Donatella Genovese
@donatellag.bsky.social
PhD Student | Works on Explainable AI | https://donatellagenovese.github.io/
3/ Interleaving Concepts with Token Embeddings

🔹 Predicted concepts are compressed into a continuous vector 🎯
🔹 They are then inserted into hidden states alongside token embeddings 🧩
February 14, 2025 at 1:12 PM
2/ Training the Model with Dual Objectives

🔹 Next-token prediction – the standard LLM training objective.
🔹 Concept prediction – the model learns to reproduce extracted concepts from its hidden state.
February 14, 2025 at 1:11 PM
1/ Concept Extraction with SAE

🔹 A Sparse Autoencoder (SAE) extracts high-level concepts from the hidden states of a pretrained LLM.
🔹 Only the most important concepts are selected based on their attribution score (impact on model output).
February 14, 2025 at 1:10 PM