Lightnews — Scholar-powered news

Edd

@erlandpg.bsky.social

Very cool found from Unsloth team as always 🫡

Unsloth AI @unsloth.ai · Dec 4

Introducing Unsloth Dynamic 4-bit Quantization!

Naive quantization often harms accuracy but we avoid quantizing certain parameters. This achieves higher accuracy using only <10% more VRAM than BnB 4bit

Read our Blog: unsloth.ai/blog/dynamic...
Quants on Hugging Face: huggingface.co/collections/...

December 4, 2024 at 7:53 PM

Reposted by Edd

Hacker News 20

@betterhn20.e-work.xyz

SSH Artwork https://github.com/villasv/ssh-artwork (https://news.ycombinator.com/item?id=42251958)

GitHub - villasv/ssh-artwork

Contribute to villasv/ssh-artwork development by creating an account on GitHub.

github.com

November 27, 2024 at 12:02 PM

Reposted by Edd

Zach Mueller

@muellerzr.bsky.social

I'm looking for an intern!

If you are:
* Driven
* Love OSS
* Interested in distributed PyTorch training/FSDPv2/DeepSpeed

Come work with me!

Fully remote, more details to apply in the comments

A job description stating:
About this Role

This internship works at the intersections of software engineering, machine learning engineering, and education. With a strong focus on distributed training through the accelerate library (https://huggingface.co/docs/accelerate/index), we'll focus on bringing state-of-the-art training techniques into the library while also documenting and helping
teach others how they work. By the end of this internship, the candidate will have touched on all aspects of distributed training and core library contributions, including large-scale distributed training, API design, writing educational material aimed at a semi-technical audience, and
understanding the nuances of writing software that scales.

November 26, 2024 at 4:01 PM

Reposted by Edd

Embedded LLM

@embeddedllm.bsky.social

🚨 GPUs wasting 75% of training time on communication 🤯 Not anymore!

DeepSpeed Domino, with a new tensor parallelism engine, minimizes communication overhead for faster LLM training. 🚀

✅ Near-complete communication hiding
✅ Multi-node scalable solution

Blog: github.com/microsoft/De...

November 26, 2024 at 2:35 PM

Reposted by Edd

Thomas Wolf

@thomwolf.bsky.social

Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm

November 26, 2024 at 4:58 PM

Edd

@erlandpg.bsky.social

I use Bookmarks features a lot on the other platforms. Can we get one here too? .-.

November 25, 2024 at 6:40 PM

Reposted by Edd

Chris Paxton

@cpaxton.bsky.social

Very good read from Aidan McLau on the other site about o1-style "reasoning models," what their limits are, and why they aren't as good at general language tasks: aidanmclaughlin.notion.site/reasoners-pr...

November 25, 2024 at 6:14 PM

Reposted by Edd

xjdr

@xjdr.bsky.social

the BigVision repo is my current reference impl for gemma and ViT. such an underrated repo @giffmana.bsky.social and team are doing the lord's work

github.com/google-resea...

github.com/google-resea...

big_vision/big_vision/models/ppp/gemma.py at main · google-research/big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more. - google-research/big_vision

github.com

November 24, 2024 at 5:25 PM

Reposted by Edd

Arthur Douillard

@douillard.bsky.social

distributed learning for LLM?

recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.

what is it exactly?

🧵

November 25, 2024 at 12:02 PM

Reposted by Edd

xjdr

@xjdr.bsky.social

very interesting work and it reminds me a bit of this paper. Tokenizers and ROPE must die. after samplers, i am on to those next ...
arxiv.org/abs/2407.036...

November 25, 2024 at 2:20 AM

Reposted by Edd

Arthur Douillard

@douillard.bsky.social

Adaptive Decoding via Latent Preference Optimization: arxiv.org/abs/2411.09661

* Add a small MLP + classifier which predict a temperature per token
* They train the MLP with a variant of DPO (arxiv.org/abs/2305.18290) with the temperatures as latent
* low temp for math, high for creative tasks

November 25, 2024 at 11:06 AM

Reposted by Edd

Hamel Husain

@hamel.bsky.social

I wrote this code to follow the same people someone else is following. I figured that would fix my feed esp if someone is having a good experience I can just "have what they are having"

gist.github.com/hamelsmu/fb9...

"I'll have what they are having" for bluesky. The motiviation is to mimic who someone else is following who reports they are having a good experience on bluesky!

"I'll have what they are having" for bluesky. The motiviation is to mimic who someone else is following who reports they are having a good experience on bluesky! - follow_theirs.py

gist.github.com

November 24, 2024 at 3:37 PM

Reposted by Edd

xjdr

@xjdr.bsky.social

now that people are paying attention again, here is your periodic reminder. Always run in bf16. always apply ROPE and attention softmax at float32 (as shown here)

github.com/xjdr-alt/ent...

November 24, 2024 at 5:23 PM

Reposted by Edd

Ethan

@ethansmith2000.com

ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.

November 24, 2024 at 11:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news