To truly do this in early fusion models you'd have to capture synergy (arxiv.org/abs/2306.04539)
To truly do this in early fusion models you'd have to capture synergy (arxiv.org/abs/2306.04539)
cambrian-mllm.github.io
cambrian-mllm.github.io
- I set first n tokens to be generated by target model where here n=3.
- I'm using Qwen2-VL family here
- Prompt is "Describe this image" so the first three tokens are always the same
This was just to baseline and now to experiment with various tasks and configs
- I set first n tokens to be generated by target model where here n=3.
- I'm using Qwen2-VL family here
- Prompt is "Describe this image" so the first three tokens are always the same
This was just to baseline and now to experiment with various tasks and configs
For researchers/engineers/anons: Excited about multimodal data? Have innovative ideas? Join us!
(also recruiting cracked interns)
jobs.ashbyhq.com/DatologyAI
14/n
For researchers/engineers/anons: Excited about multimodal data? Have innovative ideas? Join us!
(also recruiting cracked interns)
jobs.ashbyhq.com/DatologyAI
14/n
And it’s also not the only thing we’ve been working on. The rest of the team has been cooking on text curation since the beginning, so stay tuned for our text curation results coming soon for LLM pretraining!
13/n
And it’s also not the only thing we’ve been working on. The rest of the team has been cooking on text curation since the beginning, so stay tuned for our text curation results coming soon for LLM pretraining!
13/n
While not the target, working on data curation resulted in competitive/superior CLIP models. We’ve extensively benchmarked our models against external models, with >10x data efficiency and better performance. See blog post for more!
12/n
While not the target, working on data curation resulted in competitive/superior CLIP models. We’ve extensively benchmarked our models against external models, with >10x data efficiency and better performance. See blog post for more!
12/n
Even with ~2.4x FLOPs reduction, we attain absolute 13% improvement for retrieval and 9.2% for classification
11/n
Even with ~2.4x FLOPs reduction, we attain absolute 13% improvement for retrieval and 9.2% for classification
11/n
Specialized pretraining is the future, powered by curation at scale
Cost of training the smaller model also quickly amortizes over time (think millions of API calls/day)
10/n
Specialized pretraining is the future, powered by curation at scale
Cost of training the smaller model also quickly amortizes over time (think millions of API calls/day)
10/n
Productionizing models requires inference optimizations, trading off speed for lower quality and doesn’t work for overtrained generalist models.
9/n
Productionizing models requires inference optimizations, trading off speed for lower quality and doesn’t work for overtrained generalist models.
9/n
Improve model quality by up to ~13% absolute (22% relative) for the same train cost
Curation means training with in-domain data on end tasks you care about!
8/n
Improve model quality by up to ~13% absolute (22% relative) for the same train cost
Curation means training with in-domain data on end tasks you care about!
8/n
Retrieval: 28.8x-43x training speedup vs baseline
Classification: 2.88x-13x vs baseline
We filter out redundant & harmful data to achieve equal performance much faster
7/n
Retrieval: 28.8x-43x training speedup vs baseline
Classification: 2.88x-13x vs baseline
We filter out redundant & harmful data to achieve equal performance much faster
7/n
Data: image-text data from DataComp’s CommonPool up to 1B samples
Curate: separate strategies for retrieval and classification tasks
Train: CLIP ViT-B/16 & 32, 8k batch
Evals: DataComp+SugarCrepe
6/n
Data: image-text data from DataComp’s CommonPool up to 1B samples
Curate: separate strategies for retrieval and classification tasks
Train: CLIP ViT-B/16 & 32, 8k batch
Evals: DataComp+SugarCrepe
6/n
It’s a data pipeline that composes of cutting-edge research put in production and battle tested at scale. Shown below are a few themes that individual algorithms fall under and more details can be found in the blog post!
5/n
It’s a data pipeline that composes of cutting-edge research put in production and battle tested at scale. Shown below are a few themes that individual algorithms fall under and more details can be found in the blog post!
5/n
It’s a pivotal time in AI to unlock tremendous societal value & train domain-specific foundation models outside of large labs
With scalable data curation, we can:
1) train models **faster**
2) train models **better**
3) train models **smaller**
4/n
It’s a pivotal time in AI to unlock tremendous societal value & train domain-specific foundation models outside of large labs
With scalable data curation, we can:
1) train models **faster**
2) train models **better**
3) train models **smaller**
4/n