Guy Dar
guydar.bsky.social
Guy Dar
@guydar.bsky.social
All in all, it's hard to say how practically feasible it is to obtain without substantial leakage. Fortunately, there are many free parameters that can be tweaked here, and many variants to consider.
January 12, 2025 at 4:52 PM
This allows an (approximate) causal variant of training data attribution -- understanding which data points contributed to the emergence of a capability!
January 12, 2025 at 4:52 PM
A major advantage of this method over other methods is that it allows ⏳"time travel"⏳
Because we can trace which params were influenced by a data point, we can ablate or manipulate them!
January 12, 2025 at 4:51 PM
The idea is related to locality-sensitive hashing (LSH) that sends similar vectors to close buckets. We train the model with a dropout mask that depends on the semantics of the input ("semantic dropout masks") to accomplish that.
January 12, 2025 at 4:51 PM
In this work, I present a *sketch* of an idea around this. Instead of allocating inputs to rigid groups, we aim for fuzzy membership, such that semantically similar inputs update related subsets of the parameters.
January 12, 2025 at 4:50 PM
For example, gradient routing partitions data points into disjoint groups and updates only a certain region in the network for each group. This method, as well as others, is limited to a predefined set of localizations.
January 12, 2025 at 4:50 PM