Lightnews — Scholar-powered news

Aritra Roy Gosthipaty

@arig23498.bsky.social

Some pointers on parallel computing:

A small thread 🧵👇

March 3, 2025 at 6:05 PM

Reposted by Aritra Roy Gosthipaty

Michael Tschannen

@mtschannen.bsky.social

HF model collection for transformers:
huggingface.co/collections/...

HF model collection for OpenCLIP and timm:
huggingface.co/collections/...

And of course big_vision checkpoints:
github.com/google-resea...

SigLIP2 - a google Collection

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

February 22, 2025 at 3:34 PM

Reposted by Aritra Roy Gosthipaty

Michael Tschannen

@mtschannen.bsky.social

Paper:
arxiv.org/abs/2502.14786

HF blog post from @arig23498.bsky.social et al. with a gentle intro to the training recipe and a demo:
huggingface.co/blog/siglip2

Thread with results overview from Xiaohua (only on X, sorry - these are all in the paper):
x.com/XiaohuaZhai/...

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. In this second iteration, we extend the original image-text training obje...

arxiv.org

February 22, 2025 at 3:34 PM

Reposted by Aritra Roy Gosthipaty

Michael Tschannen

@mtschannen.bsky.social

📢2⃣ Yesterday we released SigLIP 2!

TL;DR: Improved high-level semantics, localization, dense features, and multilingual capabilities via drop-in replacement for v1.

Bonus: Variants supporting native aspect and variable sequence length.

A thread with interesting resources👇

February 22, 2025 at 3:34 PM

Reposted by Aritra Roy Gosthipaty

Sung Kim

@sungkim.bsky.social

Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! by @arig23498.bsky.social

Build a proof-of-concept API, hosting Qwen2.5-VL-7B-Instruct on Hugging Face Spaces using Docker.

huggingface.co/blog/ariG234...

🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker!

A Blog post by Aritra Roy Gosthipaty on Hugging Face

huggingface.co

January 29, 2025 at 2:00 PM

Aritra Roy Gosthipaty

@arig23498.bsky.social

huggingface.co/blog/logits-...

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

December 23, 2024 at 10:53 AM

Aritra Roy Gosthipaty

@arig23498.bsky.social

The Qwen team is doing so much for the community by keeping research open and constructive.

They listen to the community and put efforts in building competitive models.

I was intrigued by their latest `Qwen/QwQ-32B-Preview` model and wanted to play with it.

[1/N]

December 3, 2024 at 6:41 AM

Reposted by Aritra Roy Gosthipaty

Sergio Paniego

@sergiopaniego.bsky.social

I've been exploring the latest Llama 3.2 releases and working on a couple of projects you may find interesting:

1️⃣ Understanding tool calling with Llama 3.2 🔧
2️⃣ Using Text Generation Inference (TGI) with Llama models 🦙

(links in the next post)

November 29, 2024 at 10:10 AM

Aritra Roy Gosthipaty

@arig23498.bsky.social

What is THE pain point in training Vision Language Models according to you?

I will go first, the data pipeline.

November 26, 2024 at 10:52 AM

Aritra Roy Gosthipaty

@arig23498.bsky.social

Re-caption your webdataset with Qwen2-VL

github.com/sayakpaul/si...

Adding support for Qwen model by ariG23498 · Pull Request #3 · sayakpaul/simple-image-recaptioning

A working colab notebook

github.com

November 23, 2024 at 12:48 PM

Aritra Roy Gosthipaty

@arig23498.bsky.social

huggingface.co/blog/layerskip

Faster Text Generation with Self-Speculative Decoding

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 20, 2024 at 8:22 PM

Aritra Roy Gosthipaty

@arig23498.bsky.social

To the video generation enthusiats, Mochi 1 Preview is now supported in `diffusers`

November 15, 2024 at 10:19 AM

Reposted by Aritra Roy Gosthipaty

Lain

@not-so-lain.bsky.social

awesome, thanks a lot for sharing 🙌

November 13, 2024 at 4:37 PM

Aritra Roy Gosthipaty

@arig23498.bsky.social

`bitsandbytes` makes it really easy to quantize models

Note: MB should be GB in the diagram.

November 13, 2024 at 12:03 PM

Aritra Roy Gosthipaty

@arig23498.bsky.social

Read about the Qwen2.5-Coder Series

huggingface.co/blog/ariG234...

November 12, 2024 at 7:09 AM

Aritra Roy Gosthipaty

@arig23498.bsky.social

I am diving head first into Vision Language Models. Comment below the papers that I definitely should read.

November 7, 2024 at 5:52 AM

Aritra Roy Gosthipaty

@arig23498.bsky.social

Welcome the @huggingface.bsky.social integration in PyCharm. From instant model cards to navigating the local cache, working with Hugging Face models becomes a lot easier with PyCharm.

Bonus: Claim a 3 month PyCharm subscription using PyCharm4HF

Blog Post: huggingface.co/blog/pycharm...

Hugging Face + PyCharm

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 6, 2024 at 11:25 AM

Aritra Roy Gosthipaty

@arig23498.bsky.social

github.com/ml-gde/jflux

Try out the FLUX model in JAX. It also works on TPUs if that is your thing.

For people who want to work on it, there are open issues as well. Happy coding!

GitHub - ml-gde/jflux: JAX Implementation of Black Forest Labs' Flux.1 family of models

JAX Implementation of Black Forest Labs' Flux.1 family of models - ml-gde/jflux

github.com

November 6, 2024 at 7:50 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news