I occasionally post AI memes.
yashbhalgat.github.io
🚀 Scale beyond 8B
🎯 Multi-modal capabilities
⚡️Faster inference
🔄 Reinforcement learning integration
Exciting to see alternatives to autoregressive models succeeding at scale!
Paper: ml-gsai.github.io/LLaDA-demo/
(8/8)
🚀 Scale beyond 8B
🎯 Multi-modal capabilities
⚡️Faster inference
🔄 Reinforcement learning integration
Exciting to see alternatives to autoregressive models succeeding at scale!
Paper: ml-gsai.github.io/LLaDA-demo/
(8/8)
- Matches/exceeds on most tasks
- Better at math & Chinese tasks
- Strong in-context learning
- Improved dialogue capabilities
(7/8) 🧵
- Matches/exceeds on most tasks
- Better at math & Chinese tasks
- Strong in-context learning
- Improved dialogue capabilities
(7/8) 🧵
On tasks requiring bidirectional reasoning, it outperforms GPT-4 and maintains consistent performance in both forward/reverse directions.
(6/8) 🧵
On tasks requiring bidirectional reasoning, it outperforms GPT-4 and maintains consistent performance in both forward/reverse directions.
(6/8) 🧵
- Low-confidence remasking: Remask tokens the model is least sure about
- Semi-autoregressive: Generate in blocks left-to-right while maintaining bidirectional context
(5/8) 🧵
- Low-confidence remasking: Remask tokens the model is least sure about
- Semi-autoregressive: Generate in blocks left-to-right while maintaining bidirectional context
(5/8) 🧵
The model learns to predict original tokens given partially masked sequences. No causal masking used.
Also enables instruction-conditioned generation with the same technique. No modifications.
(4/8) 🧵
The model learns to predict original tokens given partially masked sequences. No causal masking used.
Also enables instruction-conditioned generation with the same technique. No modifications.
(4/8) 🧵
LLaDA's forward process gradually masks tokens while reverse process predicts them simultaneously. This enables bidirectional modeling.
(3/8) 🧵
LLaDA's forward process gradually masks tokens while reverse process predicts them simultaneously. This enables bidirectional modeling.
(3/8) 🧵
- Successful scaling of masked diffusion to LLM scale (8B params)
- Masking with variable ratios for forward/reverse process
- Smart remasking strategies for generation, incl. semi-autoregressive
- SOTA on reversal tasks, matching Llama 3 on others
(2/8) 🧵
- Successful scaling of masked diffusion to LLM scale (8B params)
- Masking with variable ratios for forward/reverse process
- Smart remasking strategies for generation, incl. semi-autoregressive
- SOTA on reversal tasks, matching Llama 3 on others
(2/8) 🧵
Code: github.com/bcmi/Light-A...
Could be a game-changer for quick video mood/lighting adjustments without complicated VFX pipelines! 🎬
Code: github.com/bcmi/Light-A...
Could be a game-changer for quick video mood/lighting adjustments without complicated VFX pipelines! 🎬
They can transform regular videos into moody noir scenes, add sunlight streaming through windows, or create cyberpunk neon vibes -- works on everything from portrait videos to car commercials! 🚗
They can transform regular videos into moody noir scenes, add sunlight streaming through windows, or create cyberpunk neon vibes -- works on everything from portrait videos to car commercials! 🚗
- Consistent Light Attention (CLA) module for stable lighting across frames
- Progressive Light Fusion for smooth temporal transitions
- Works with ANY video diffusion model (AnimateDiff, CogVideoX)
- Zero-shot - no fine-tuning needed!
- Consistent Light Attention (CLA) module for stable lighting across frames
- Progressive Light Fusion for smooth temporal transitions
- Works with ANY video diffusion model (AnimateDiff, CogVideoX)
- Zero-shot - no fine-tuning needed!
Code: not available yet
Really excited to try this out once the code is available!
Code: not available yet
Really excited to try this out once the code is available!
- BFS-ordered skeleton sequence representation
- Autoregressive joint prediction with diffusion sampling
- Hybrid attention masking: full self-attention for shape tokens, causal attention for skeleton
- e2e trainable pipeline without clustering/MST ops
- BFS-ordered skeleton sequence representation
- Autoregressive joint prediction with diffusion sampling
- Hybrid attention masking: full self-attention for shape tokens, causal attention for skeleton
- e2e trainable pipeline without clustering/MST ops
I have posted my thoughts on the discussion here: github.com/rasbt/LLMs-f...
I have posted my thoughts on the discussion here: github.com/rasbt/LLMs-f...
Project: latent-radiance-field.github.io/LRF/
Project: latent-radiance-field.github.io/LRF/
Training and inference code available here: github.com/NVlabs/EdgeR...
Training and inference code available here: github.com/NVlabs/EdgeR...
Their ArAE model controls face count for varying detail while preserving mesh topology.
Their ArAE model controls face count for varying detail while preserving mesh topology.