Mengye Ren
mengyer.bsky.social
Mengye Ren
@mengyer.bsky.social
@agentic-ai-lab.bsky.social
mengyeren.com
8/ We also study how data augmentation choices like crop scale, input resolution, and time between sampled frames can have a large impact on video pretraining.
April 20, 2025 at 8:31 PM
7/ These performance differences manifest visually too! IN1K has noisy segmentations and FlowE misses small objects, while PooDLe avoids both problems.
April 20, 2025 at 8:31 PM
6/ Interestingly, we find that dense SSL performance is driven by large classes whereas ImageNet pretraining does well on small, foreground classes.
PooDLe is able to perform well on both small and large classes!
April 20, 2025 at 8:31 PM
5/ PooDLe, pretrained on BDD100K and Walking Tours, outperforms prior iconic and dense SSL methods on semantic segmentation and object detection!
We also release WT-Sem, an in-distribution semantic segmentation task for Walking Tours.
April 20, 2025 at 8:31 PM
4/ We also propose a spatial decoder module to upsample the top-level features to higher resolution for the dense loss. The top-level features act as an information bottleneck that both satisfies the high-level invariance loss and is compatible with upsampling for the dense loss.
April 20, 2025 at 8:31 PM
3/ PooDLe addresses these challenges by unifying a dense, flow equivariance objective over global crops and a view invariance objective over smaller subcrops that serve as pseudo-iconic views. Crops are sampled from pairs of video frames, with motion as a natural augmentation.
April 20, 2025 at 8:31 PM
2/ Dense SSL methods account for multiple subjects by computing losses over corresponding spatial regions. However, we identify a new problem – spatial imbalance! Larger background regions like the sky are prioritized over smaller foreground objects like pedestrians.
April 20, 2025 at 8:31 PM
1/ Many SSL methods revolve around ImageNet, iconic images with single subjects and balanced classes, and rely on invariance losses between augmented views. These methods can struggle on naturalistic videos, which contain multiple subjects of varying size and imbalanced classes.
April 20, 2025 at 8:31 PM
How can we leverage naturalistic videos for visual SSL? Naturalistic, i.e. uncurated, videos are abundant and can emulate the egocentric perspective.

Our paper at ICLR 2025, PooDLe🐩, proposes a new SSL method to address the challenges of learning from naturalistic videos. 🧵
April 20, 2025 at 8:31 PM
A comparison of a joint approach of QE+Retaliatory Tariff in approximated numbers.
February 1, 2025 at 3:00 PM
In addition to a retaliatory tariff against the US, the Bank of Canada should also introduce a QE to coordinate with the government to reimburse exports. Here is a comparison between QE Reimbursement vs. Retaliatory Tariff, suggested by an AI:
February 1, 2025 at 2:55 PM