Epistemology of AI.
In my latest Synthese article I tackle this question!
In my latest Synthese article I tackle this question!
At this last tier, LLMs can grasp the underlying principles that connect and unify a diverse array of facts.
Research on tasks like modular addition provides cases where LLMs move beyond memorizing examples to internalizing general rules. (6/9)
At this last tier, LLMs can grasp the underlying principles that connect and unify a diverse array of facts.
Research on tasks like modular addition provides cases where LLMs move beyond memorizing examples to internalizing general rules. (6/9)
OthelloGPT, a GPT-2 model trained on legal Othello moves, encodes the board state in internal representations that update as the game unfolds, as shown by linear probes. (5/9)
OthelloGPT, a GPT-2 model trained on legal Othello moves, encodes the board state in internal representations that update as the game unfolds, as shown by linear probes. (5/9)
LLMs can encode factual associations in the linear projections of their MLP layers.
For instance, they can ensure that a strong activation of the “Golden Gate Bridge” feature leads to a strong activation of the “in SF” feature. (4/9)
LLMs can encode factual associations in the linear projections of their MLP layers.
For instance, they can ensure that a strong activation of the “Golden Gate Bridge” feature leads to a strong activation of the “in SF” feature. (4/9)
Attention layers are key. They retrieve relevant information from earlier tokens and integrate it into the current token’s representation, making the model context-aware. (3/9)
Attention layers are key. They retrieve relevant information from earlier tokens and integrate it into the current token’s representation, making the model context-aware. (3/9)
Emerges when a model forms “features” as directions in latent space, allowing it to recognize and unify diverse manifestations of an entity or a property.
E.g., LLMs subsume “SF’s landmark” or “orange bridge” under a “Golden Gate Bridge” feature.
Emerges when a model forms “features” as directions in latent space, allowing it to recognize and unify diverse manifestations of an entity or a property.
E.g., LLMs subsume “SF’s landmark” or “orange bridge” under a “Golden Gate Bridge” feature.