Sebastian Raschka (rasbt)
banner
sebastianraschka.com
Sebastian Raschka (rasbt)
@sebastianraschka.com
ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D) & reasoning (https://mng.bz/Nwr7).

Also blogging about AI research at magazine.sebastianraschka.com.
Pinned
How do we evaluate LLMs?
I wrote up a new article on
(1) multiple-choice benchmarks,
(2) verifiers,
(3) leaderboards, and
(4) LLM judges

All with from-scratch code examples, of course!

sebastianraschka.com/blog/2025/ll...
Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
sebastianraschka.com
My "The Building Blocks of Today’s and Tomorrow’s Language Models" talk at the PyTorch Conference is now up on YouTube! youtube.com/watch?v=nDl6...

The silver lining of my late arrival and rescheduling: There was no talk after mine, it's followed by a 30 min Q&A instead of just the usual 5 :)
The Building Blocks of Today’s and Tomorrow’s Language Models - Sebastian Raschka, RAIR Lab
YouTube video by PyTorch
youtube.com
November 8, 2025 at 2:01 PM
I just saw the Kimi K2 Thinking release!

Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.

In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.
November 6, 2025 at 7:35 PM
My new field guide to alternatives to standard LLMs:

Gated DeltaNet hybrids (Qwen3-Next, Kimi Linear), text diffusion, code world models, and small reasoning transformers.

🔗 magazine.sebastianraschka.com/p/beyond-sta...
November 4, 2025 at 2:49 PM
Just saw the benchmarks of the new open-weight MiniMax-M2 LLM, and the performance is too good to ignore :). So, I just amended my "The Big LLM Architecture Comparison" with entry number 13!

Link to the full article: magazine.sebastianraschka.com/p/the-big-ll...
October 28, 2025 at 4:48 PM
A short talk on the main architecture components of LLMs this year + a look beyond the transformer architecture: www.youtube.com/watch?v=lONy...
October 27, 2025 at 3:45 PM
🔗 Mixture of Experts (MoE): github.com/rasbt/LLMs-f...
October 20, 2025 at 1:48 PM
Chapter 3, and with it the first 176 pages, is now live! (mng.bz/lZ5B)
October 16, 2025 at 1:35 PM
Sliding Window Attention
🔗 github.com/rasbt/LLMs-f...
October 13, 2025 at 1:51 PM
Multi-Head Latent Attention
🔗 github.com/rasbt/LLMs-f...
October 12, 2025 at 1:57 PM
Just a bit of weekend coding fun: A memory estimator to calculate the savings when using grouped-query attention vs multi-head attention (+ code implementations of course).

🔗 github.com/rasbt/LLMs-f...

Will add this for multi-head latent, sliding, and sparse attention as well.
October 11, 2025 at 1:46 PM
Updated & turned my Big LLM Architecture Comparison article into a video lecture.

The 11 LLM archs covered in this video:
1. DeepSeek V3/R1
2. OLMo 2
3. Gemma 3
4. Mistral Small 3.1
5. Llama 4
6. Qwen3
7. SmolLM3
8. Kimi 2
9. GPT-OSS
10. Grok 2.5
11. GLM-4.5/4.6

www.youtube.com/watch?v=rNlU...
The Big LLM Architecture Comparison
YouTube video by Sebastian Raschka
www.youtube.com
October 10, 2025 at 5:05 PM
From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM).

A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the smallest Qwen3 0.6B model.)
October 9, 2025 at 4:23 PM
It only took 13 years, but dark mode is finally here
sebastianraschka.com/blog/2021/dl...
October 8, 2025 at 1:50 AM
How do we evaluate LLMs?
I wrote up a new article on
(1) multiple-choice benchmarks,
(2) verifiers,
(3) leaderboards, and
(4) LLM judges

All with from-scratch code examples, of course!

sebastianraschka.com/blog/2025/ll...
Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
sebastianraschka.com
October 5, 2025 at 3:51 PM
Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...
April 19, 2025 at 1:48 PM
Coded Llama 3.2 model from scratch and shared it on the HF Hub.
Why? Because I think 1B & 3B models are great for experimentation, and I wanted to share a clean, readable implementation for learning and research: huggingface.co/rasbt/llama-...
March 31, 2025 at 5:13 PM
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling: www.youtube.com/watch?v=Zar2...
March 23, 2025 at 1:38 PM
Reposted by Sebastian Raschka (rasbt)
I'm right now on the last chapter of "Build a Large Language Model (from scratch)" by @sebastianraschka.com and it's absolutely amazing to get started. Now I can understand why people lose their shit over DeepSeek, for example
March 17, 2025 at 11:26 PM
I just shared a new tutorial: Implementing GPT From Scratch!

In this 1:45 h hands-on coding session, I go over implementing the GPT architecture, the foundation of modern LLMs (and I also have bonus material converting it to Llama 3.2): www.youtube.com/watch?v=YSAk...
Build an LLM from Scratch 4: Implementing a GPT model from Scratch To Generate Text
YouTube video by Sebastian Raschka
www.youtube.com
March 17, 2025 at 3:27 PM
Yesterday, Google released Gemma 3, their latest open-weight LLM. Finally, a new addition to the "Big 5" of open-weight models (Gemma, Llama, DeepSeek, Qwen, and Mistral). I just went through the Gemma 3 report and experimented a bit with the models, and there are plenty of interesting tidbits:
March 13, 2025 at 4:03 PM
Just read Gemma 3 is out. Gemma models are super underestimated imho. Will be taking this for a spin in the next few days. In the meantime, they have a technical report here: storage.googleapis.com/deepmind-med...
storage.googleapis.com
March 12, 2025 at 4:06 PM
Just uploaded my "Coding Attention Mechanisms" tutorial. A 2h15m session on coding attention mechanisms to understand how the engine of LLMs works:
self-attention → parameterized self-attention → causal self-attention → multi-head self-attention
www.youtube.com/watch?v=-Ll8...
Build an LLM from Scratch 3: Coding attention mechanisms
YouTube video by Sebastian Raschka
www.youtube.com
March 11, 2025 at 4:10 PM
I just shared a new article, "The State of Reasoning Models", where I am exploring 12 new research articles on improving the reasoning capabilities of LLMs (all published after the release of DeepSeek R1): magazine.sebastianraschka.com/p/state-of-l...

Happy reading!
The State of LLM Reasoning Models
Part 1: Inference-Time Compute Scaling Methods
magazine.sebastianraschka.com
March 8, 2025 at 2:37 PM
Takeaways from the latest State of ML Competitions report mlcontests.com/state-of-mac...:
- Python & PyTorch still dominate
- 80%+ use NVIDIA GPUs, but no multi-node setups 🤔
- LoRA still popular for training efficiency, but full finetuning gains traction.
Surprisingly, CNNs still lead in CV comps
March 5, 2025 at 4:24 PM
Reposted by Sebastian Raschka (rasbt)
@gilesthomas.com is building an LLM from scratch as a
Learning exercise and I am so jealous. Working off the book _Build a Large Language Model (from Scratch)_ by Sebastian Raschka

First post in series here:

www.gilesthomas.com/2024/12/llm-...
Writing an LLM from scratch, part 1
Learning how to build a large language model from scratch, following Sebastian Raschka's book 'Build a Large Language Model (from Scratch)'. Part 1/??
www.gilesthomas.com
March 5, 2025 at 1:40 PM