Web 2: https://r-ega.net
Local independence violations
Wording effects
Dimensionality assessment
Structural robustness 5/n
Local independence violations
Wording effects
Dimensionality assessment
Structural robustness 5/n
6 second-level dimensions (traits)
3 third-level dimensions (meta-traits) 4/n
6 second-level dimensions (traits)
3 third-level dimensions (meta-traits) 4/n
Develop interactive dashboards to explore these findings dynamically.
Develop interactive dashboards to explore these findings dynamically.
Implement a large-scale simulation to see the effects of varying the number of items per dimension
Investigate how adversarial inputs or fine-tuning might shift these representational landscapes,
Implement a large-scale simulation to see the effects of varying the number of items per dimension
Investigate how adversarial inputs or fine-tuning might shift these representational landscapes,
This investigation not only advances our understanding of how GPT-2 (and by extension, other large language models) processes language internally but also opens up new avenues for interpretability research.
This investigation not only advances our understanding of how GPT-2 (and by extension, other large language models) processes language internally but also opens up new avenues for interpretability research.
Functional labels such as “Token Feature Extraction,” “Contextual Integration,” and “Optimal Abstraction Peak,” and
A horizontal transition boundary, precisely indicating where the network’s representational strategy shifts.
Functional labels such as “Token Feature Extraction,” “Contextual Integration,” and “Optimal Abstraction Peak,” and
A horizontal transition boundary, precisely indicating where the network’s representational strategy shifts.
One of the project’s central achievements is the development of a mind-blowing NMI heatmap that visually narrates these transitions. The heatmap uses a non-linear Viridis color scale to highlight regions of high semantic coherence and overlays annotations that mark:
One of the project’s central achievements is the development of a mind-blowing NMI heatmap that visually narrates these transitions. The heatmap uses a non-linear Viridis color scale to highlight regions of high semantic coherence and overlays annotations that mark:
The final layer reverts to lower NMI values. This “compression” is indicative of the model preparing its internal state for next-token prediction rather than maintaining a rich semantic structure.
The final layer reverts to lower NMI values. This “compression” is indicative of the model preparing its internal state for next-token prediction rather than maintaining a rich semantic structure.
The network continues to process and refine these representations. Although NMI values remain high, they reflect a fine-tuning process in conceptual understanding rather than the initial burst of abstraction seen in Layer 9.
The network continues to process and refine these representations. Although NMI values remain high, they reflect a fine-tuning process in conceptual understanding rather than the initial burst of abstraction seen in Layer 9.
This zone is where the magic happens—Layer 9, in particular, consistently shows the highest semantic coherence (bright yellow on the heatmap).
This zone is where the magic happens—Layer 9, in particular, consistently shows the highest semantic coherence (bright yellow on the heatmap).
A transition begins in these layers as context starts to shape token representations. Notably, Layer 6 marks the first significant rise in NMI (approaching 0.91), suggesting a shift toward meaningful semantic integration.
A transition begins in these layers as context starts to shape token representations. Notably, Layer 6 marks the first significant rise in NMI (approaching 0.91), suggesting a shift toward meaningful semantic integration.
The earliest layers capture raw token-level properties. Here, the low NMI values (deep blue on the heatmap) indicate that the model primarily encodes lexical-syntactic details without forming robust abstract representations.
The earliest layers capture raw token-level properties. Here, the low NMI values (deep blue on the heatmap) indicate that the model primarily encodes lexical-syntactic details without forming robust abstract representations.
New Interpretability Roadmap
The result? A Roadmap with embedding dimensions on the x-axis, GPT-2 layers on the y-axis, and NMI as the color gradient — showing where and how semantic structure emerges.
New Interpretability Roadmap
The result? A Roadmap with embedding dimensions on the x-axis, GPT-2 layers on the y-axis, and NMI as the color gradient — showing where and how semantic structure emerges.