Lightnews — Scholar-powered news

Yacine

@yacinemahdid.bsky.social

That’s how I start all my YouTube tutorial about the latest deep learning architecture.

Karly Kingsley @karlykingsley.bsky.social · May 20

Yesss

Sarah Hollowell
@sarahhollowell
I love when you ask someone about a thing they know a lot about and they start with "okay, so" because you KNOW you're about to get a novel's worth of nerd sh t and it's going to be so delightful

May 20, 2025 at 8:08 PM

Yacine

@yacinemahdid.bsky.social

Started coding a while game with the kids, whew this is fun!

May 4, 2025 at 5:50 PM

Yacine

@yacinemahdid.bsky.social

I just asked ChatGPT to help me set up the boilerplate for a Python script that make use of their API.

1. Secret is pasted straight into file, no environment management.
2. The code is for a deprecated API.

What a vibe.

April 1, 2025 at 3:27 PM

Yacine

@yacinemahdid.bsky.social

2025 will be the year of linear attention, I feel it.

April 1, 2025 at 2:12 AM

Yacine

@yacinemahdid.bsky.social

There is an exhilarating feeling in finally understand a whole line of research after a few weeks of study.

It’s like a flash of every paper, formula and code seen that just come flooding all at once in its correct form.

April 1, 2025 at 1:51 AM

Yacine

@yacinemahdid.bsky.social

Most foundational models use softmax attention, which scales quadratically with input length—a major bottleneck.

Linear attention has existed since 2020, yet large-scale models rarely use it. Why?

minimax-01 finally makes linear attention work at scale. Deep dive here: 📌 youtu.be/iRuvGU-Sk3c

March 31, 2025 at 2:16 PM

Yacine

@yacinemahdid.bsky.social

I'm back for the weekly deep-learning study session! ✨

Sorry for the month break, was a bit overwhelmed with lots of things at work.

I'll try to move around the schedule a bit so that more people in different time zones can attend.

📸 PS: I gave a talk at a conference in February!

March 17, 2025 at 3:31 PM

Yacine

@yacinemahdid.bsky.social

Lots of confusion out there about what AI Engineering is about.

What's an agent, what's a workflow, what's an agentic system, etc.

I made this tutorial on the topic packed with information from the latest research from HuggingFace.

Check it out over here:
youtu.be/UMYKjT9exb4

Enjoy! 🌹

Introduction to AI Agents - Theory and Code

YouTube video by Deep Learning with Yacine

youtu.be

March 10, 2025 at 2:51 PM

Yacine

@yacinemahdid.bsky.social

This is the kind of research we need more of:

Naomi Saphra @nsaphra.bsky.social · Feb 25

Ever looked at LLM skill emergence and thought 70B parameters was a magic number? Our new paper shows sudden breakthroughs are samples from bimodal performance distributions across seeds. Observed accuracy jumps abruptly while the underlying accuracy DISTRIBUTION changes slowly!

Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao, Tian Qin, David Alvarez-Melis, Sham Kakade, Naomi Saphra
In this paper, we explore the nature of sudden breakthroughs in language model performance at scale, which stands in contrast to smooth improvements governed by scaling laws. While advocates of "emergence" view abrupt performance gains as capabilities unlocking at specific scales, others have suggested that they are produced by thresholding effects and alleviated by continuous metrics. We propose that breakthroughs are instead driven by continuous changes in the probability distribution of training outcomes, particularly when performance is bimodally distributed across random seeds. In synthetic length generalization tasks, we show that different random seeds can produce either highly linear or emergent scaling trends. We reveal that sharp breakthroughs in metrics are produced by underlying continuous changes in their distribution across seeds. Furthermore, we provide a case study of inverse scaling and show that even as the probability of a successful run declines, the average performance of a successful run continues to increase monotonically. We validate our distributional scaling framework on realistic settings by measuring MMLU performance in LLM populations. These insights emphasize the role of random variation in the effect of scale on LLM capabilities.

February 25, 2025 at 11:07 PM

Yacine

@yacinemahdid.bsky.social

The state of AI/consciousness discourse:

February 25, 2025 at 11:05 PM

Reposted by Yacine

Joseph Howley

@illdottore.bsky.social

you fucked up a perfectly good computer is what you did. look at it. it's got innumeracy

Screenshot of a twitter post showing that the latest openAI commercial model is better than previous models at doing arithmetic but still cannot reliably produce the correct answer of multiplication problems with values greater than 11 x 11. It's supposed to be impressive I think

February 12, 2025 at 7:36 PM

Yacine

@yacinemahdid.bsky.social

Wouldn’t it be funny that we never reach AGI because of short term incentive to keep going with Transformers.

Then we patch the whole thing left and right to keep the illusion of general intelligence with massive injection of capital.

Literally yeeting the AI field in a local minima and digging.

February 10, 2025 at 12:35 AM

Yacine

@yacinemahdid.bsky.social

This is so well put, must read!

Tom Lawton @lawtontri.bsky.social · Feb 9

I keep hearing from healthcare #AI companies that Large Language Models ( #LLMs ) can be made to be "deterministic" as part of arguments around safety. I thought I'd do a little ranty 🧵 to explain why (a) it's not true in the real world, and (b) it's not even the right question.

1/

Abstract image of a Large Language Model

February 9, 2025 at 3:30 PM

Yacine

@yacinemahdid.bsky.social

Open AI is getting absolutely cooker right now.
Crazy how we went from the darling of AI to a company researchers loathe.

not a good vibe.

January 29, 2025 at 12:08 AM

Yacine

@yacinemahdid.bsky.social

The one thing I dislike about current v. of OpenAI is how surface level they are in their research coms.

They are hinting big breakthrough, but man look at the landscape.

Every competitors around is stacked with billions and PhD.

Whatever they are trying to win, won’t be achieved by secrecy.

January 6, 2025 at 2:02 AM

Reposted by Yacine

Vicki

@vickiboykis.com

how i'd learn machine learning in 2025 if i had to start from scratch:

1. find a log that i initially planned to turn into a table leg
2. make it into a puppet that can walk and talk
3. have the puppet, through a series of adventures, turn into a real boy and realize the true value of friendship

January 5, 2025 at 10:25 PM

Reposted by Yacine

Shahab Bakhtiari

@shahabbakht.bsky.social

Just came across this interesting blog post on the job market for new PhD grads in AI: kyunghyuncho.me/i-sensed-anxiety-and-frustration-at-neurips24

The argument feels pretty reasonable. Here is my take: (1/6)

#MLSky #NeuroAI 🧠📈

January 3, 2025 at 4:02 PM

Reposted by Yacine

purr.in.ink

@purrinink.bsky.social

🐈‍⬛🤍.

December 31, 2024 at 3:27 PM

Reposted by Yacine

Surya Ganguli

@suryaganguli.bsky.social

Our new paper! "Analytic theory of creativity in convolutional diffusion models" lead expertly by @masonkamb.bsky.social
arxiv.org/abs/2412.20292
Our closed-form theory needs no training, is mechanistically interpretable & accurately predicts diffusion model outputs with high median r^2~0.9

December 31, 2024 at 4:54 PM

Reposted by Yacine

Nithur

@nithur.bsky.social

Do this in 2025:

December 30, 2024 at 8:32 AM

Reposted by Yacine

Simon Willison

@simonwillison.net

Here's my end-of-year review of things we learned out about LLMs in 2024 - we learned a LOT of things simonwillison.net/2024/Dec/31/...

Table of contents:

The GPT-4 barrier was comprehensively broken
Some of those GPT-4 models run on my laptop
LLM prices crashed, thanks to competition and increased efficiency
Multimodal vision is common, audio and video are starting to emerge
Voice and live camera mode are science fiction come to life
Prompt driven app generation is a commodity already
Universal access to the best models lasted for just a few short months
“Agents” still haven’t really happened yet
Evals really matter
Apple Intelligence is bad, Apple’s MLX library is excellent
The rise of inference-scaling “reasoning” models
Was the best currently available LLM trained in China for less than $6m?
The environmental impact got better
The environmental impact got much, much worse
The year of slop
Synthetic training data works great
LLMs somehow got even harder to use
Knowledge is incredibly unevenly distributed
LLMs need better criticism
Everything tagged “llms” on my blog in 2024

December 31, 2024 at 6:10 PM

Yacine

@yacinemahdid.bsky.social

100%, this is messed up and will have massive consequences.

We’re going to see more and more heavily gated web communities.

joao @joao.omg.lol · Dec 30

Seriously, it seems everything around LLMs works by messing up the social contract, it's outright predatory towards things like the small web and people who just wan to share the neat things they learn or do
Source: news.ycombinator.com/item?id=4254...

Hacker news comment with the following text:
One of my websites was absolutely destroyed by Meta's AI bot: Meta-ExternalAgent https://developers.facebook.com/docs/sharing/webmasters/web-...
It seems a bit naive for some reason and doesn't do performance back-off the way I would expect from Google Bot. It just kept repeatedly requesting more and more until my server crashed, then it would back off for a minute and then request more again.

My solution was to add a Cloudflare rule to block requests from their User-Agent. I also added more nofollow rules to links and a robots.txt but those are just suggestions and some bots seem to ignore them.

Cloudflare also has a feature to block known AI bots and even suspected AI bots: https://blog.cloudflare.com/declaring-your-aindependence-blo... As much as I dislike Cloudflare centralization, this was a super convenient feature.

December 31, 2024 at 12:43 AM

Reposted by Yacine

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

It is roughly 10 lines of code to go from 1 GPU to N GPUs with pytorch DDP. Pointing this out so that everyone is aware and doesn't shy away from scaling their code

December 30, 2024 at 7:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news