Andreas Madsen
banner
andreasmadsen.bsky.social
Andreas Madsen
@andreasmadsen.bsky.social
Ph.D. in NLP Interpretability from Mila. Previously: independent researcher, freelancer in ML, and Node.js core developer.
FMMs are when models are designed such that measuring faithfulness is cheap and precise, which makes it possible to optimize explanations toward maximum faithfulness.
November 28, 2024 at 2:02 PM
Self-explanations are when LLMs explain themselves. Current models are not capable of this, but we suggest how that could be changed.Diagram of self-explanations. Showing input going in, then the regular output and explanation going out.
November 28, 2024 at 2:02 PM