prev at meta, snap research and georgia tech //
web: https://yashkant.github.io/
Please come chat with me and Ethan Weber - during our poster session on Pippo, on Sat 5-7pm (Hall D)! 😊 👋
Web: yashkant.github.io/pippo
CC: @ethanjohnweber.bsky.social, @igilitschenski.bsky.social
Please come chat with me and Ethan Weber - during our poster session on Pippo, on Sat 5-7pm (Hall D)! 😊 👋
Web: yashkant.github.io/pippo
CC: @ethanjohnweber.bsky.social, @igilitschenski.bsky.social
@yashkant.bsky.social, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta M., Igor Gilitschenski, Shunsuke Saito, Timur Bagautdinov 3/🧵
arxiv.org/abs/2502.07785
@yashkant.bsky.social, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta M., Igor Gilitschenski, Shunsuke Saito, Timur Bagautdinov 3/🧵
arxiv.org/abs/2502.07785
TL;DR: 1K Multiview Diffusion Transformer pre-trained on 3B Human images without captions; post-trained on 2.5K studio captures with pixel-aligned control via ControlMLP; generates > 5x views at inference
TL;DR: 1K Multiview Diffusion Transformer pre-trained on 3B Human images without captions; post-trained on 2.5K studio captures with pixel-aligned control via ControlMLP; generates > 5x views at inference