Lightnews — Scholar-powered news

Chiao Cheng 🇹🇼 🇸🇬 🇺🇸

@chiaolun.bsky.social

14 followers 50 following 3 posts

Jumps to conclusions

Posts Replies Media Videos

Reposted by Chiao Cheng 🇹🇼 🇸🇬 🇺🇸

Dmytro Mishkin

@ducha-aiki.bsky.social

I still don’t understand why it can be that distillation works, given the same data.

Is it a way to smuggle more computation into smaller model without looking at the data much more times?

December 21, 2024 at 3:23 PM

Reposted by Chiao Cheng 🇹🇼 🇸🇬 🇺🇸

Michael Tschannen

@mtschannen.bsky.social

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?

We have been pondering this during summer and developed a new model: JetFormer 🌊🤖

arxiv.org/abs/2411.19722

A thread 👇

1/

December 2, 2024 at 4:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news