Aritra Roy Gosthipaty
banner
arig23498.bsky.social
Aritra Roy Gosthipaty
@arig23498.bsky.social
MLE @ Hugging Face
Some pointers on parallel computing:

A small thread 🧵👇
March 3, 2025 at 6:05 PM
Reposted by Aritra Roy Gosthipaty
HF model collection for transformers:
huggingface.co/collections/...

HF model collection for OpenCLIP and timm:
huggingface.co/collections/...

And of course big_vision checkpoints:
github.com/google-resea...
SigLIP2 - a google Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
February 22, 2025 at 3:34 PM
Reposted by Aritra Roy Gosthipaty
Paper:
arxiv.org/abs/2502.14786

HF blog post from @arig23498.bsky.social et al. with a gentle intro to the training recipe and a demo:
huggingface.co/blog/siglip2

Thread with results overview from Xiaohua (only on X, sorry - these are all in the paper):
x.com/XiaohuaZhai/...
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. In this second iteration, we extend the original image-text training obje...
arxiv.org
February 22, 2025 at 3:34 PM
Reposted by Aritra Roy Gosthipaty
📢2⃣ Yesterday we released SigLIP 2!

TL;DR: Improved high-level semantics, localization, dense features, and multilingual capabilities via drop-in replacement for v1.

Bonus: Variants supporting native aspect and variable sequence length.

A thread with interesting resources👇
February 22, 2025 at 3:34 PM
Reposted by Aritra Roy Gosthipaty
Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! by @arig23498.bsky.social

Build a proof-of-concept API, hosting Qwen2.5-VL-7B-Instruct on Hugging Face Spaces using Docker.

huggingface.co/blog/ariG234...
🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker!
A Blog post by Aritra Roy Gosthipaty on Hugging Face
huggingface.co
January 29, 2025 at 2:00 PM
The Qwen team is doing so much for the community by keeping research open and constructive.

They listen to the community and put efforts in building competitive models.

I was intrigued by their latest `Qwen/QwQ-32B-Preview` model and wanted to play with it.

[1/N]
December 3, 2024 at 6:41 AM
Reposted by Aritra Roy Gosthipaty
I've been exploring the latest Llama 3.2 releases and working on a couple of projects you may find interesting:

1️⃣ Understanding tool calling with Llama 3.2 🔧
2️⃣ Using Text Generation Inference (TGI) with Llama models 🦙

(links in the next post)
November 29, 2024 at 10:10 AM
What is THE pain point in training Vision Language Models according to you?

I will go first, the data pipeline.
November 26, 2024 at 10:52 AM
Re-caption your webdataset with Qwen2-VL

github.com/sayakpaul/si...
Adding support for Qwen model by ariG23498 · Pull Request #3 · sayakpaul/simple-image-recaptioning
A working colab notebook
github.com
November 23, 2024 at 12:48 PM
To the video generation enthusiats, Mochi 1 Preview is now supported in `diffusers`
November 15, 2024 at 10:19 AM
Reposted by Aritra Roy Gosthipaty
awesome, thanks a lot for sharing 🙌
November 13, 2024 at 4:37 PM
`bitsandbytes` makes it really easy to quantize models

Note: MB should be GB in the diagram.
November 13, 2024 at 12:03 PM
Read about the Qwen2.5-Coder Series

huggingface.co/blog/ariG234...
November 12, 2024 at 7:09 AM
I am diving head first into Vision Language Models. Comment below the papers that I definitely should read.
November 7, 2024 at 5:52 AM
Welcome the @huggingface.bsky.social integration in PyCharm. From instant model cards to navigating the local cache, working with Hugging Face models becomes a lot easier with PyCharm.

Bonus: Claim a 3 month PyCharm subscription using PyCharm4HF

Blog Post: huggingface.co/blog/pycharm...
Hugging Face + PyCharm
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
November 6, 2024 at 11:25 AM
github.com/ml-gde/jflux

Try out the FLUX model in JAX. It also works on TPUs if that is your thing.

For people who want to work on it, there are open issues as well. Happy coding!
GitHub - ml-gde/jflux: JAX Implementation of Black Forest Labs' Flux.1 family of models
JAX Implementation of Black Forest Labs' Flux.1 family of models - ml-gde/jflux
github.com
November 6, 2024 at 7:50 AM