Lightnews — Scholar-powered news

Sebastian Raschka (rasbt)

@sebastianraschka.com

9.7K followers 240 following 280 posts

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D) & reasoning (https://mng.bz/Nwr7).

Also blogging about AI research at magazine.sebastianraschka.com.

Posts Replies Media Videos

Pinned

Sebastian Raschka (rasbt) @sebastianraschka.com · Oct 5

How do we evaluate LLMs?
I wrote up a new article on
(1) multiple-choice benchmarks,
(2) verifiers,
(3) leaderboards, and
(4) LLM judges

All with from-scratch code examples, of course!

sebastianraschka.com/blog/2025/ll...

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

sebastianraschka.com

Sebastian Raschka (rasbt)

@sebastianraschka.com

My "The Building Blocks of Today’s and Tomorrow’s Language Models" talk at the PyTorch Conference is now up on YouTube! youtube.com/watch?v=nDl6...

The silver lining of my late arrival and rescheduling: There was no talk after mine, it's followed by a 30 min Q&A instead of just the usual 5 :)

The Building Blocks of Today’s and Tomorrow’s Language Models - Sebastian Raschka, RAIR Lab

YouTube video by PyTorch

youtube.com

November 8, 2025 at 2:01 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I just saw the Kimi K2 Thinking release!

Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.

In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.

November 6, 2025 at 7:35 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

My new field guide to alternatives to standard LLMs:

Gated DeltaNet hybrids (Qwen3-Next, Kimi Linear), text diffusion, code world models, and small reasoning transformers.

🔗 magazine.sebastianraschka.com/p/beyond-sta...

November 4, 2025 at 2:49 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just saw the benchmarks of the new open-weight MiniMax-M2 LLM, and the performance is too good to ignore :). So, I just amended my "The Big LLM Architecture Comparison" with entry number 13!

Link to the full article: magazine.sebastianraschka.com/p/the-big-ll...

October 28, 2025 at 4:48 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

A short talk on the main architecture components of LLMs this year + a look beyond the transformer architecture: www.youtube.com/watch?v=lONy...

October 27, 2025 at 3:45 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

🔗 Mixture of Experts (MoE): github.com/rasbt/LLMs-f...

October 20, 2025 at 1:48 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Chapter 3, and with it the first 176 pages, is now live! (mng.bz/lZ5B)

October 16, 2025 at 1:35 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Sliding Window Attention
🔗 github.com/rasbt/LLMs-f...

October 13, 2025 at 1:51 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Multi-Head Latent Attention
🔗 github.com/rasbt/LLMs-f...

October 12, 2025 at 1:57 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just a bit of weekend coding fun: A memory estimator to calculate the savings when using grouped-query attention vs multi-head attention (+ code implementations of course).

🔗 github.com/rasbt/LLMs-f...

Will add this for multi-head latent, sliding, and sparse attention as well.

October 11, 2025 at 1:46 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Updated & turned my Big LLM Architecture Comparison article into a video lecture.

The 11 LLM archs covered in this video:
1. DeepSeek V3/R1
2. OLMo 2
3. Gemma 3
4. Mistral Small 3.1
5. Llama 4
6. Qwen3
7. SmolLM3
8. Kimi 2
9. GPT-OSS
10. Grok 2.5
11. GLM-4.5/4.6

www.youtube.com/watch?v=rNlU...

The Big LLM Architecture Comparison

YouTube video by Sebastian Raschka

www.youtube.com

October 10, 2025 at 5:05 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM).

A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the smallest Qwen3 0.6B model.)

October 9, 2025 at 4:23 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

It only took 13 years, but dark mode is finally here
sebastianraschka.com/blog/2021/dl...

October 8, 2025 at 1:50 AM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

sebastianraschka.com

October 5, 2025 at 3:51 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...

April 19, 2025 at 1:48 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Coded Llama 3.2 model from scratch and shared it on the HF Hub.
Why? Because I think 1B & 3B models are great for experimentation, and I wanted to share a clean, readable implementation for learning and research: huggingface.co/rasbt/llama-...

March 31, 2025 at 5:13 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling: www.youtube.com/watch?v=Zar2...

March 23, 2025 at 1:38 PM

Reposted by Sebastian Raschka (rasbt)

Rémy

@xowap.dev

I'm right now on the last chapter of "Build a Large Language Model (from scratch)" by @sebastianraschka.com and it's absolutely amazing to get started. Now I can understand why people lose their shit over DeepSeek, for example

March 17, 2025 at 11:26 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I just shared a new tutorial: Implementing GPT From Scratch!

In this 1:45 h hands-on coding session, I go over implementing the GPT architecture, the foundation of modern LLMs (and I also have bonus material converting it to Llama 3.2): www.youtube.com/watch?v=YSAk...

Build an LLM from Scratch 4: Implementing a GPT model from Scratch To Generate Text

YouTube video by Sebastian Raschka

www.youtube.com

March 17, 2025 at 3:27 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Yesterday, Google released Gemma 3, their latest open-weight LLM. Finally, a new addition to the "Big 5" of open-weight models (Gemma, Llama, DeepSeek, Qwen, and Mistral). I just went through the Gemma 3 report and experimented a bit with the models, and there are plenty of interesting tidbits:

March 13, 2025 at 4:03 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just read Gemma 3 is out. Gemma models are super underestimated imho. Will be taking this for a spin in the next few days. In the meantime, they have a technical report here: storage.googleapis.com/deepmind-med...

storage.googleapis.com

March 12, 2025 at 4:06 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just uploaded my "Coding Attention Mechanisms" tutorial. A 2h15m session on coding attention mechanisms to understand how the engine of LLMs works:
self-attention → parameterized self-attention → causal self-attention → multi-head self-attention
www.youtube.com/watch?v=-Ll8...

Build an LLM from Scratch 3: Coding attention mechanisms

YouTube video by Sebastian Raschka

www.youtube.com

March 11, 2025 at 4:10 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I just shared a new article, "The State of Reasoning Models", where I am exploring 12 new research articles on improving the reasoning capabilities of LLMs (all published after the release of DeepSeek R1): magazine.sebastianraschka.com/p/state-of-l...

Happy reading!

The State of LLM Reasoning Models

Part 1: Inference-Time Compute Scaling Methods

magazine.sebastianraschka.com

March 8, 2025 at 2:37 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Takeaways from the latest State of ML Competitions report mlcontests.com/state-of-mac...:
- Python & PyTorch still dominate
- 80%+ use NVIDIA GPUs, but no multi-node setups 🤔
- LoRA still popular for training efficiency, but full finetuning gains traction.
Surprisingly, CNNs still lead in CV comps

March 5, 2025 at 4:24 PM

Reposted by Sebastian Raschka (rasbt)

Aziz Poonawalla 🖖🍕☕🚲☪️🇺🇸👍

@azizforamerica.com

@gilesthomas.com is building an LLM from scratch as a
Learning exercise and I am so jealous. Working off the book _Build a Large Language Model (from Scratch)_ by Sebastian Raschka

First post in series here:

www.gilesthomas.com/2024/12/llm-...

Writing an LLM from scratch, part 1

Learning how to build a large language model from scratch, following Sebastian Raschka's book 'Build a Large Language Model (from Scratch)'. Part 1/??

www.gilesthomas.com

March 5, 2025 at 1:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news