Lightnews — Scholar-powered news

Sayan

@shockroborty.bsky.social

i see sink tokens in pre-trained tts and vision-language models all the time, not sure why they are overlooked in multimodal settings

September 2, 2025 at 6:40 PM

Reposted by Sayan

Jason Coupet

@professajay.bsky.social

You can totally see how this mf bankrupted a casino.

April 3, 2025 at 1:45 AM

Reposted by Sayan

Tim Kellogg

@timkellogg.me

Moonshot + Muon

A new 16B model

The Muon optimizer is 2x more data efficient than AdamE, but only for matrix parameters

note: this is a big deal

huggingface.co/moonshotai

moonshotai (Moonshot AI)

Org profile for Moonshot AI on Hugging Face, the AI community building the future.

huggingface.co

February 22, 2025 at 8:42 PM

Sayan

@shockroborty.bsky.social

super valuable stuff

Thomas Wolf @thomwolf.bsky.social · Feb 19

After 6+ months in the making and over a year of GPU compute, we're excited to release the "Ultra-Scale Playbook": hf.co/spaces/nanot...

A book to learn all about 5D parallelism, ZeRO, CUDA kernels, how/why overlap compute & coms with theory, motivation, interactive plots and 4000+ experiments!

The Ultra-Scale Playbook - a Hugging Face Space by nanotron

The ultimate guide to training LLM on large GPU Clusters

hf.co

February 19, 2025 at 6:42 PM

Sayan

@shockroborty.bsky.social

👀🙏

Anton @anton-l.bsky.social · Feb 12

LLM Reasoning labs will be eating good today🍔

We commandeered the HF cluster for a few days and generated 1.2M reasoning-filled solutions to 500k NuminaMath problems with DeepSeek-R1 🐳
Have fun!

February 12, 2025 at 3:45 PM

Reposted by Sayan

Alexander Doria

@dorialexander.bsky.social

In case it interests anyone, I managed to set up a demo of GRPO RL training in Colab. It’s an adaptation of Will Brown instant classic for math reasoning. Replace llama 1B with qwen 0.5b and inference with vllm. Full training in about 2 hours.

colab.research.google.com/drive/1bfhs1...

February 2, 2025 at 1:49 PM

Sayan

@shockroborty.bsky.social

I don’t understand this eval. why compare their deep research model with gemini thinking, when gemini deep research exists

February 3, 2025 at 1:39 AM

Sayan

@shockroborty.bsky.social

at this point, gpt3 and claude sonnet/haiku could easily be open sourced

January 30, 2025 at 12:55 PM

Sayan

@shockroborty.bsky.social

little disappointed seeing reactions of researchers from frontier labs on deepseek. science is not a zero sum game. we should really applaud the open weights, reproducibility, MIT license and detailed report which we hardly see in this decade. gracefulness besides the bias would’ve been nice

January 26, 2025 at 12:31 PM

Sayan

@shockroborty.bsky.social

The inference speed is amazing!

Jeff Dean @jeffdean.bsky.social · Jan 22

We’ve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December.

Today we’re sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks 📈:
• AIME: 73.3%
• GPQA: 74.2%
• MMMU: 75.4%

January 22, 2025 at 12:50 AM

Sayan

@shockroborty.bsky.social

Just saw ScaleAI's front page ad on "America must win the AI war".

I'm afraid in the AI war only Palantir wins.

January 21, 2025 at 9:54 PM

Sayan

@shockroborty.bsky.social

internal search is very interesting, i hope the implementation is easy to read through

Marc Lanctot @sharky6000.bsky.social · Dec 5

Super happy to reveal our new paper! 🎉🙌♟️

We trained a model to play four games, and the performance in each increases by "external search" (MCTS using a learned world model) and "internal search" where the model outputs the whole plan on its own!

December 5, 2024 at 11:32 PM

Reposted by Sayan

Laura

@lauraruis.bsky.social

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️

November 20, 2024 at 4:35 PM

Reposted by Sayan

Nathan Lambert

@natolambert.bsky.social

The most realistic reason to be pro open source AI is to reduce concentration of power.

Alondra Nelson @alondra.bsky.social · Nov 29

"money has flowed to tech giants and others in their orbit... [and] raises an uncomfortable prospect: that this supposedly revolutionary technology might never deliver on its promise of broad economic transformation, but instead just concentrate more wealth" www.bloomberg.com/opinion/arti...

ChatGPT’s $8 Trillion Birthday Gift to Big Tech

Two years in, generative AI’s value to the world is still unclear. But these charts show that it’s been a bonanza for the largest tech firms.

www.bloomberg.com

November 29, 2024 at 6:55 PM

Sayan

@shockroborty.bsky.social

Most elaborate game of chinese whisper

November 28, 2024 at 2:44 AM

Sayan

@shockroborty.bsky.social

I believe o1 will be replicated soon. First by meta and then a truly open source release with datasets and training recipe by @ai2.bsky.social team

November 27, 2024 at 4:30 AM

Sayan

@shockroborty.bsky.social

Outside tech, I see a lot of AI fear and hatred. Usually the argument is on AI taking jobs and creative tasks. I don't remember seeing this kind of general consensus of hatred and fear about a new technology before

November 27, 2024 at 2:21 AM

Reposted by Sayan

Ai2

@ai2.bsky.social

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

November 26, 2024 at 8:51 PM

Reposted by Sayan

xjdr

@xjdr.bsky.social

i keep forgetting to include this cause i always assume people do this by default. Any time there is an exponent or a norm, you should be working in the highest practical precision

Lucas Beyer (bl16) @giffmana.ai · Nov 24

All softmaxes, also the output/vocab one. And the normalizations in f32 too.

November 24, 2024 at 8:05 PM

Reposted by Sayan

Ian Goodfellow

@ian-goodfellow.bsky.social

Posting a call for help: does anyone know of a good way to simultaneously treat both POTS and Ménière’s disease? Please contact me if you’re either a clinician with experience doing this or a patient who has found a good solution. Context in thread

November 24, 2024 at 4:34 PM

Reposted by Sayan

Anna Rogers

@annarogers.bsky.social

📢 Ultimate test of #NLP bluesky:

I need emergency reviewers for NAACL submissions on encoders (one multilingual, one for sentence embeddings). Help a desperate editor abandoned by the ACs! Author response starts tomorrow, so that's a true emergency.

If you're my hero, lmk your openreview profile.

November 21, 2024 at 7:47 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news