Lightnews — Scholar-powered news

Mengye Ren

@mengyer.bsky.social

Republicans want to tax remittances sent by migrants, visa holders to their home countries

House Republicans want to tax remittances, which would cover over 40 million people in the U.S., including green card and visa holders who send money to families.

www.nbcnews.com

May 17, 2025 at 3:20 PM

Mengye Ren

@mengyer.bsky.social

10/ This @agentic-ai-lab.bsky.social project was led by
Alex Wang @alexnwang.bsky.social and Chris Hoang @choang.bsky.social , together with Yuwen Xiong, @yann-lecun.bsky.social and @mengyer.bsky.social.

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

9/ For more details, please check out our paper and website, or stop by our poster (Fri 10 AM, Hall 3 + Hall 2B #336) at ICLR!
Paper: arxiv.org/abs/2408.11208
Website: agenticlearning.ai/poodle/

PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Self-supervised learning has driven significant progress in learning from single-subject, iconic images. However, there are still unanswered questions about the use of minimally-curated, naturalistic ...

arxiv.org

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

8/ We also study how data augmentation choices like crop scale, input resolution, and time between sampled frames can have a large impact on video pretraining.

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

7/ These performance differences manifest visually too! IN1K has noisy segmentations and FlowE misses small objects, while PooDLe avoids both problems.

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

6/ Interestingly, we find that dense SSL performance is driven by large classes whereas ImageNet pretraining does well on small, foreground classes.
PooDLe is able to perform well on both small and large classes!

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

5/ PooDLe, pretrained on BDD100K and Walking Tours, outperforms prior iconic and dense SSL methods on semantic segmentation and object detection!
We also release WT-Sem, an in-distribution semantic segmentation task for Walking Tours.

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

4/ We also propose a spatial decoder module to upsample the top-level features to higher resolution for the dense loss. The top-level features act as an information bottleneck that both satisfies the high-level invariance loss and is compatible with upsampling for the dense loss.

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

3/ PooDLe addresses these challenges by unifying a dense, flow equivariance objective over global crops and a view invariance objective over smaller subcrops that serve as pseudo-iconic views. Crops are sampled from pairs of video frames, with motion as a natural augmentation.

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

2/ Dense SSL methods account for multiple subjects by computing losses over corresponding spatial regions. However, we identify a new problem – spatial imbalance! Larger background regions like the sky are prioritized over smaller foreground objects like pedestrians.

April 20, 2025 at 8:31 PM

Mengye Ren

@mengyer.bsky.social

1/ Many SSL methods revolve around ImageNet, iconic images with single subjects and balanced classes, and rely on invariance losses between augmented views. These methods can struggle on naturalistic videos, which contain multiple subjects of varying size and imbalanced classes.

April 20, 2025 at 8:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news