Lightnews — Scholar-powered news

Samira

@samiraabnar.bsky.social

28 followers 32 following 10 posts

Posts Replies Media Videos

Samira

@samiraabnar.bsky.social

For presumably reasoning-heavy downstream tasks, sparsity negatively affects transfer. Inference compute plays a crucial role here. Good news: mechanisms like Chain-of-Thought (CoT) can adaptively increase inference compute.

January 28, 2025 at 6:26 AM

Samira

@samiraabnar.bsky.social

For many downstream tasks, sparsity doesn't affect the relationship between upstream and downstream performance in a few-shot in-context learning.

January 28, 2025 at 6:26 AM

Samira

@samiraabnar.bsky.social

In practical settings, where total parameters are bounded, the optimal sparsity level depends on model size and training budget, eventually approaching 1.0 as model size grows.

January 28, 2025 at 6:26 AM

Samira

@samiraabnar.bsky.social

With a fixed training budget, compute-optimal models with higher sparsity not only have more total parameters but also fewer active parameters (i.e., fewer FLOPs per token).

January 28, 2025 at 6:26 AM

Samira

@samiraabnar.bsky.social

We find that during pretraining, if memory and communication costs are ignored, higher sparsity is always better and Increasing model capacity via total parameters is the optimal strategy.

January 28, 2025 at 6:26 AM

Samira

@samiraabnar.bsky.social

In MoE models, sparsity can be adjusted by varying total parameters and FLOPs per token (via active parameters). Scaling laws for optimal sparsity levels reveal key insights into the trade-off between parameters vs. compute per token in sparse models at different scales.

January 28, 2025 at 6:26 AM

Samira

@samiraabnar.bsky.social

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs:

January 28, 2025 at 6:26 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news