Lightnews — Scholar-powered news

Yash Bhalgat

@ysbhalgat.bsky.social

Excited to announce the 1st Workshop on 3D-LLM/VLA at #CVPR2025! 🚀 @cvprconference.bsky.social

Topics: 3D-VLA models, LLM agents for 3D scene understanding, Robotic control with language.

📢 Call for papers: Deadline – April 20, 2025

🌐 Details: 3d-llm-vla.github.io

#llm #3d #Robotics #ai

March 23, 2025 at 9:35 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Results vs LLaMA3 8B:

- Matches/exceeds on most tasks
- Better at math & Chinese tasks
- Strong in-context learning
- Improved dialogue capabilities

(7/8) 🧵

February 18, 2025 at 3:07 PM

Yash Bhalgat

@ysbhalgat.bsky.social

A major result: LLaDA breaks the "reversal curse" that plagues autoregressive models. 🔄

On tasks requiring bidirectional reasoning, it outperforms GPT-4 and maintains consistent performance in both forward/reverse directions.

(6/8) 🧵

February 18, 2025 at 3:07 PM

Yash Bhalgat

@ysbhalgat.bsky.social

For generation, they introduce clever remasking strategies:

- Low-confidence remasking: Remask tokens the model is least sure about

- Semi-autoregressive: Generate in blocks left-to-right while maintaining bidirectional context

(5/8) 🧵

February 18, 2025 at 3:07 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Training uses random masking ratio t ∈ [0,1] for each sequence.

The model learns to predict original tokens given partially masked sequences. No causal masking used.

Also enables instruction-conditioned generation with the same technique. No modifications.

(4/8) 🧵

February 18, 2025 at 3:06 PM

Yash Bhalgat

@ysbhalgat.bsky.social

💡Core insight: Generative modeling principles, not autoregression, give LLMs their power.

LLaDA's forward process gradually masks tokens while reverse process predicts them simultaneously. This enables bidirectional modeling.

(3/8) 🧵

February 18, 2025 at 3:06 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Key highlights:
- Successful scaling of masked diffusion to LLM scale (8B params)
- Masking with variable ratios for forward/reverse process
- Smart remasking strategies for generation, incl. semi-autoregressive
- SOTA on reversal tasks, matching Llama 3 on others

(2/8) 🧵

February 18, 2025 at 3:05 PM

Yash Bhalgat

@ysbhalgat.bsky.social

"LLaDA: Large Language Diffusion Models" Nie et al.

Just read this fascinating paper.

Scaled up Masked Diffusion Language Models to 8B params, and show that it can match #LLMs (including Llama 3) while solving some key limitations!

Let's dive in... 🧵

(1/8)

#genai

February 18, 2025 at 3:05 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Technical highlights 🔍:
- Consistent Light Attention (CLA) module for stable lighting across frames
- Progressive Light Fusion for smooth temporal transitions
- Works with ANY video diffusion model (AnimateDiff, CogVideoX)
- Zero-shot - no fine-tuning needed!

February 16, 2025 at 4:27 PM

Yash Bhalgat

@ysbhalgat.bsky.social

New work introduces a training-free method to relight entire videos, while maintaining temporal consistency! 📽️🌅

"Light-A-Video: Training-free Video Relighting via Progressive Light Fusion" Zhou et al.

(1/n) 🧵

#genai #ai #research #video

February 16, 2025 at 4:26 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Authors claim that the model generalizes well across diverse shapes - from humanoids to marine creatures! And works with real-world images & arbitrary poses. 🤩

February 15, 2025 at 1:06 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Technical highlights:
- BFS-ordered skeleton sequence representation
- Autoregressive joint prediction with diffusion sampling
- Hybrid attention masking: full self-attention for shape tokens, causal attention for skeleton
- e2e trainable pipeline without clustering/MST ops

February 15, 2025 at 1:05 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Need to rig 3D models? 🦖

New work from UCSD and Adobe:
"RigAnything: Template-Free Autoregressive Rigging
for Diverse 3D Assets" Liu et al.

tl;dr: reduces rigging time from 2 mins to 2 secs, works on any shape category & doesn't need predefined templates! 🚀

February 15, 2025 at 1:05 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Technical approach:
- Correspondence-aware autoencoding to enhance 3D consistency in VAE latent space
- Builds 3D representations from 3D-aware 2D features
- VAE-Radiance Field alignment to bridge domain gap between latent and image space

#nerf #ai #research

February 14, 2025 at 10:28 AM

Yash Bhalgat

@ysbhalgat.bsky.social

"Latent Radiance Fields with 3D-aware 2D Representations" Zhou et al., #ICLR2025

tl;dr: Novel framework that integrates 3D awareness into VAE latent space using correspondence-aware encoding, enabling high-quality rendered images with ~50% memory savings.

(1/n) 🧵

February 14, 2025 at 10:28 AM

Yash Bhalgat

@ysbhalgat.bsky.social

The architecture uses a lightweight encoder and auto-regressive decoder to compress variable-length meshes into fixed-length codes, enabling point cloud and single-image conditioning.

Their ArAE model controls face count for varying detail while preserving mesh topology.

February 13, 2025 at 10:36 PM

Yash Bhalgat

@ysbhalgat.bsky.social

"EdgeRunner" (#ICLR2025) from #Nvidia & PKU introduces an auto-regressive auto-encoder for mesh generation, supporting up to 4000 faces at 512³ resolution. 🤩

Their mesh tokenization algorithm (adapted from EdgeBreaker) achieves ~50% compression (4-5 tokens per face vs 9), making training efficient.

February 13, 2025 at 10:34 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Technical highlight: They combine 3D latent diffusion with multi-view conditioning for the base shape, then use 2D normal maps for refinement. The results look way cleaner than previous methods.

February 12, 2025 at 10:11 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Their two-stage approach: First generate coarse geometry (5s), then add fine details (20s) using normal maps based refinement. Smart way to balance speed and quality.

February 12, 2025 at 10:10 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Just came across this fascinating paper "CraftsMan3D" - a practical approach to text/image-to-3D generation that mimics how artists actually work!

Code available (pretrained models too) 🤩: github.com/wyysf-98/Cra...

(1/n) 🧵

February 12, 2025 at 10:10 PM

Yash Bhalgat

@ysbhalgat.bsky.social

Got me excited for a second here 🫠

February 10, 2025 at 12:12 PM

Yash Bhalgat

@ysbhalgat.bsky.social

So, what happened this week in #AI?

January 29, 2025 at 12:57 PM

Yash Bhalgat

@ysbhalgat.bsky.social

(3/n) ⚡️ Speed matters: GSLoc adds just ~180ms overhead while providing substantial accuracy gains. We also provide GSLoc_rel variant for even faster refinement when runtime is critical.

January 23, 2025 at 11:53 AM

Yash Bhalgat

@ysbhalgat.bsky.social

(2/n) 📈 Results: GSLoc achieves new SOTA on indoor datasets (7Scenes & 12Scenes) and significantly improves accuracy on Cambridge Landmarks. Our one-shot refinement outperforms methods requiring 50+ optimization steps!

January 23, 2025 at 11:52 AM

Yash Bhalgat

@ysbhalgat.bsky.social

(1/n) 🔑 Key idea: We use 3DGS to render high-quality synthetic images & depth maps, enabling efficient one-shot pose refinement of existing APR and SCR methods. No need for iterative optimization or training specialized feature extractors!

January 23, 2025 at 11:52 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news