Lightnews — Scholar-powered news

Hafez Ghaemi

@hafezghm.bsky.social

60 followers 44 following 11 posts

Ph.D. Student @mila-quebec.bsky.social and @umontreal.ca, AI Researcher

Posts Replies Media Videos

Hafez Ghaemi

@hafezghm.bsky.social

Interestingly, seq-JEPA shows path integration capabilities – an important research problem in neuroscience. By observing a sequence of views and their corresponding actions, it can integrate the path connecting the initial view to the final view.

(9/10)

May 14, 2025 at 12:53 PM

Hafez Ghaemi

@hafezghm.bsky.social

Thanks to action conditioning, the visual backbone encodes rotation information which can be decoded from its representations, while the transformer encoder aggregates different rotated views, reduces intra-class variations (caused by rotations), and produces a semantic object representation.

8/10

May 14, 2025 at 12:53 PM

Hafez Ghaemi

@hafezghm.bsky.social

On 3D Invariant-Equivariant Benchmark (3DIEBench) where each object view has a different rotation, seq-JEPA achieves top performance on both invariance-related object categorization and equivariance-related rotation prediction w/o sacrificing one for the other.

(7/10)

May 14, 2025 at 12:53 PM

Hafez Ghaemi

@hafezghm.bsky.social

Seq-JEPA learns invariant-equivariant representations for tasks that contain sequential observations and transformations; e.g., it can learn semantic image representations by seeing a sequence of small image patches across simulated eye movements w/o hand-crafted augmentation or masking.

(6/10)

May 14, 2025 at 12:53 PM

Hafez Ghaemi

@hafezghm.bsky.social

Inspired by this, we designed seq-JEPA which processes sequences of views and their relative transformations (actions).

➡️ A transformer encoder aggregates these action-conditioned view representations to predict a yet unseen view.

(4/10)

May 14, 2025 at 12:53 PM

Hafez Ghaemi

@hafezghm.bsky.social

Current SSL methods face a trade-off: optimizing for transformation invariance in representational space (useful in high-level classification) often reduces equivariance (needed for tasks related to details like object rotation & movement). Our world model, seq-JEPA, resolves this trade-off.

2/10

May 14, 2025 at 12:53 PM

Hafez Ghaemi

@hafezghm.bsky.social

Preprint Alert 🚀

Can we simultaneously learn transformation-invariant and transformation-equivariant representations with self-supervised learning?

TL;DR Yes! This is possible via simple predictive learning & architectural inductive biases – without extra loss terms and predictors!

🧵 (1/10)

May 14, 2025 at 12:53 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news