🔹 Predicted concepts are compressed into a continuous vector 🎯
🔹 They are then inserted into hidden states alongside token embeddings 🧩
🔹 Predicted concepts are compressed into a continuous vector 🎯
🔹 They are then inserted into hidden states alongside token embeddings 🧩
🔹 Next-token prediction – the standard LLM training objective.
🔹 Concept prediction – the model learns to reproduce extracted concepts from its hidden state.
🔹 Next-token prediction – the standard LLM training objective.
🔹 Concept prediction – the model learns to reproduce extracted concepts from its hidden state.
🔹 A Sparse Autoencoder (SAE) extracts high-level concepts from the hidden states of a pretrained LLM.
🔹 Only the most important concepts are selected based on their attribution score (impact on model output).
🔹 A Sparse Autoencoder (SAE) extracts high-level concepts from the hidden states of a pretrained LLM.
🔹 Only the most important concepts are selected based on their attribution score (impact on model output).