nikhil07prakash.bsky.social
@nikhil07prakash.bsky.social
We found that the LM generates a Visibility ID at the visibility sentence which serves as source info. Its address copy stays in-place, while a pointer copy flows to later lookback tokens. There, a QK-circuit dereferences the pointer to fetch info about the observed character as payload.
June 24, 2025 at 5:15 PM
Step 3: The LM now uses the state OI at the last token as a pointer and its in-place copy as an address to look back to the correct token, this time fetching its token value (e.g., "beer") as the payload, which gets predicted as the final output.
June 24, 2025 at 5:14 PM
Step 2: The LM binds the character-object-state triplet by copying their OIs (source) to the state token. These OIs also flow to the last token via corresponding tokens in the query (pointer). Next, LM uses both copies to attend to the correct state from last token and fetch its state OI (payload).
June 24, 2025 at 5:14 PM
Step 1: LM maps each vital token (character, object, state) to an abstract Ordering ID (OI), a reference that marks it as the first or second of its type, regardless of the actual token.
June 24, 2025 at 5:14 PM
Here is how it works: the model duplicates key info across two tokens, letting later attention heads look back at earlier ones to retrieve it, rather than passing it forward directly. Like leaving a breadcrumb trail in context.
June 24, 2025 at 5:14 PM
Since Theory of Mind (ToM) is fundamental to social intelligence numerous works have benchmarked this capability of LMs. However, the internal mechanics responsible for solving (or failing to solve) such tasks remains unexplored...
June 24, 2025 at 5:13 PM