Yash Bhalgat
banner
ysbhalgat.bsky.social
Yash Bhalgat
@ysbhalgat.bsky.social
PhD at VGG, Oxford w/ Andrew Zisserman, Andrea Vedaldi, Joao Henriques, Iro Laina. Past: Senior RS Qualcomm #AI #Research, UMich, IIT Bombay.

I occasionally post AI memes.

yashbhalgat.github.io
I think a few things will happen soon:
🚀 Scale beyond 8B
🎯 Multi-modal capabilities
⚡️Faster inference
🔄 Reinforcement learning integration

Exciting to see alternatives to autoregressive models succeeding at scale!

Paper: ml-gsai.github.io/LLaDA-demo/

(8/8)
SOCIAL MEDIA TITLE TAG
SOCIAL MEDIA DESCRIPTION TAG TAG
ml-gsai.github.io
February 18, 2025 at 3:08 PM
Results vs LLaMA3 8B:

- Matches/exceeds on most tasks
- Better at math & Chinese tasks
- Strong in-context learning
- Improved dialogue capabilities

(7/8) 🧵
February 18, 2025 at 3:07 PM
A major result: LLaDA breaks the "reversal curse" that plagues autoregressive models. 🔄

On tasks requiring bidirectional reasoning, it outperforms GPT-4 and maintains consistent performance in both forward/reverse directions.

(6/8) 🧵
February 18, 2025 at 3:07 PM
For generation, they introduce clever remasking strategies:

- Low-confidence remasking: Remask tokens the model is least sure about

- Semi-autoregressive: Generate in blocks left-to-right while maintaining bidirectional context

(5/8) 🧵
February 18, 2025 at 3:07 PM
Training uses random masking ratio t ∈ [0,1] for each sequence.

The model learns to predict original tokens given partially masked sequences. No causal masking used.

Also enables instruction-conditioned generation with the same technique. No modifications.

(4/8) 🧵
February 18, 2025 at 3:06 PM
💡Core insight: Generative modeling principles, not autoregression, give LLMs their power.

LLaDA's forward process gradually masks tokens while reverse process predicts them simultaneously. This enables bidirectional modeling.

(3/8) 🧵
February 18, 2025 at 3:06 PM
Key highlights:
- Successful scaling of masked diffusion to LLM scale (8B params)
- Masking with variable ratios for forward/reverse process
- Smart remasking strategies for generation, incl. semi-autoregressive
- SOTA on reversal tasks, matching Llama 3 on others

(2/8) 🧵
February 18, 2025 at 3:05 PM
Project page: bujiazi.github.io/light-a-vide...
Code: github.com/bcmi/Light-A...

Could be a game-changer for quick video mood/lighting adjustments without complicated VFX pipelines! 🎬
Light-A-VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop VideoClick to Play and Loop Video
bujiazi.github.io
February 16, 2025 at 4:28 PM
The results are pretty good ✨
They can transform regular videos into moody noir scenes, add sunlight streaming through windows, or create cyberpunk neon vibes -- works on everything from portrait videos to car commercials! 🚗
February 16, 2025 at 4:28 PM
Technical highlights 🔍:
- Consistent Light Attention (CLA) module for stable lighting across frames
- Progressive Light Fusion for smooth temporal transitions
- Works with ANY video diffusion model (AnimateDiff, CogVideoX)
- Zero-shot - no fine-tuning needed!
February 16, 2025 at 4:27 PM
Project page: liuisabella.com/RigAnything/
Code: not available yet

Really excited to try this out once the code is available!
RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets
liuisabella.com
February 15, 2025 at 1:06 PM
Authors claim that the model generalizes well across diverse shapes - from humanoids to marine creatures! And works with real-world images & arbitrary poses. 🤩
February 15, 2025 at 1:06 PM
Technical highlights:
- BFS-ordered skeleton sequence representation
- Autoregressive joint prediction with diffusion sampling
- Hybrid attention masking: full self-attention for shape tokens, causal attention for skeleton
- e2e trainable pipeline without clustering/MST ops
February 15, 2025 at 1:05 PM
@sebastianraschka.com this is such an interesting discussion! I haven't tried this myself, but I think this can be analyzed theoretically by looking at the rank of the attention matrix in both cases.

I have posted my thoughts on the discussion here: github.com/rasbt/LLMs-f...
Self attention: Merge Query matrix and Key matrix into a single covariance matrix? · rasbt LLMs-from-scratch · Discussion #517
When compute the context vector in the attention algorithm, three weight matrices were introduced. It has discussed in #454 that the value matrix W_V is not necessary. For the rest two, query matri...
github.com
February 14, 2025 at 2:42 PM
Interesting how they handle the domain gap between 2D latent space and 3D representations through their three-stage pipeline. The correspondence-aware encoding significantly reduces high-frequency noise while preserving geometry.

Project: latent-radiance-field.github.io/LRF/
Latent Radiance Fields with 3D-aware 2D Representations
latent-radiance-field.github.io
February 14, 2025 at 10:29 AM
Technical approach:
- Correspondence-aware autoencoding to enhance 3D consistency in VAE latent space
- Builds 3D representations from 3D-aware 2D features
- VAE-Radiance Field alignment to bridge domain gap between latent and image space

#nerf #ai #research
February 14, 2025 at 10:28 AM
Project: research.nvidia.com/labs/dir/edg...
Training and inference code available here: github.com/NVlabs/EdgeR...
February 13, 2025 at 10:36 PM
The architecture uses a lightweight encoder and auto-regressive decoder to compress variable-length meshes into fixed-length codes, enabling point cloud and single-image conditioning.

Their ArAE model controls face count for varying detail while preserving mesh topology.
February 13, 2025 at 10:36 PM