(9/10)
(9/10)
8/10
8/10
(7/10)
(7/10)
(6/10)
(6/10)
➡️ A transformer encoder aggregates these action-conditioned view representations to predict a yet unseen view.
(4/10)
➡️ A transformer encoder aggregates these action-conditioned view representations to predict a yet unseen view.
(4/10)
2/10
2/10
Can we simultaneously learn transformation-invariant and transformation-equivariant representations with self-supervised learning?
TL;DR Yes! This is possible via simple predictive learning & architectural inductive biases – without extra loss terms and predictors!
🧵 (1/10)
Can we simultaneously learn transformation-invariant and transformation-equivariant representations with self-supervised learning?
TL;DR Yes! This is possible via simple predictive learning & architectural inductive biases – without extra loss terms and predictors!
🧵 (1/10)