Epistemology of AI.
link.springer.com/article/10.1...
link.springer.com/article/10.1...
I conclude that deep learning models are capable of such understanding!
I conclude that deep learning models are capable of such understanding!
It doubles as an accessible introduction to the field of mechanistic interpretability! (9/9)
It doubles as an accessible introduction to the field of mechanistic interpretability! (9/9)
Philosophy of AI now needs to forge conceptions that fit them. (8/9)
Philosophy of AI now needs to forge conceptions that fit them. (8/9)
LLMs exhibit the phenomenon of parallel mechanisms: instead of relying on a single unified process, they solve problems by deploying many distinct heuristics in parallel. This approach stands in stark contrast to the parsimony typical of human understanding. (7/9)
LLMs exhibit the phenomenon of parallel mechanisms: instead of relying on a single unified process, they solve problems by deploying many distinct heuristics in parallel. This approach stands in stark contrast to the parsimony typical of human understanding. (7/9)
At this last tier, LLMs can grasp the underlying principles that connect and unify a diverse array of facts.
Research on tasks like modular addition provides cases where LLMs move beyond memorizing examples to internalizing general rules. (6/9)
At this last tier, LLMs can grasp the underlying principles that connect and unify a diverse array of facts.
Research on tasks like modular addition provides cases where LLMs move beyond memorizing examples to internalizing general rules. (6/9)
OthelloGPT, a GPT-2 model trained on legal Othello moves, encodes the board state in internal representations that update as the game unfolds, as shown by linear probes. (5/9)
OthelloGPT, a GPT-2 model trained on legal Othello moves, encodes the board state in internal representations that update as the game unfolds, as shown by linear probes. (5/9)
LLMs can encode factual associations in the linear projections of their MLP layers.
For instance, they can ensure that a strong activation of the “Golden Gate Bridge” feature leads to a strong activation of the “in SF” feature. (4/9)
LLMs can encode factual associations in the linear projections of their MLP layers.
For instance, they can ensure that a strong activation of the “Golden Gate Bridge” feature leads to a strong activation of the “in SF” feature. (4/9)
Attention layers are key. They retrieve relevant information from earlier tokens and integrate it into the current token’s representation, making the model context-aware. (3/9)
Attention layers are key. They retrieve relevant information from earlier tokens and integrate it into the current token’s representation, making the model context-aware. (3/9)
Emerges when a model forms “features” as directions in latent space, allowing it to recognize and unify diverse manifestations of an entity or a property.
E.g., LLMs subsume “SF’s landmark” or “orange bridge” under a “Golden Gate Bridge” feature.
Emerges when a model forms “features” as directions in latent space, allowing it to recognize and unify diverse manifestations of an entity or a property.
E.g., LLMs subsume “SF’s landmark” or “orange bridge” under a “Golden Gate Bridge” feature.