Lightnews — Scholar-powered news

Sebastian Raschka (rasbt)

@sebastianraschka.com

10K followers 250 following 310 posts

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D) & reasoning (https://mng.bz/Nwr7).

Also blogging about AI research at magazine.sebastianraschka.com.

Posts Replies Media Videos

Pinned

Sebastian Raschka (rasbt) @sebastianraschka.com · Dec 13

Just updated the Big LLM Architecture Comparison article...
...it grew quite a bit since the initial version in July 2025, more than doubled!
magazine.sebastianraschka.com/p/the-big-ll...

Reposted by Sebastian Raschka (rasbt)

Nathan Lambert

@natolambert.bsky.social

Recorded a podcast, think it’s pretty good and comprehensive, hope you like it ;) youtu.be/EV7WhVT270Q?...

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

YouTube video by Lex Fridman

youtu.be

January 31, 2026 at 11:06 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Been a while since I did an LLM architecture post. Just stumbled upon the Arcee AI Trinity Large release + technical report released yesterday and couldn't resist :)

Also added a new section to my LLM architecture comparison article with more details: magazine.sebastianraschka.com/i/168650848/20

January 29, 2026 at 4:36 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Been pretty heads-down finishing Chapter 6 on implementing RLVR via GRPO. Just finished, and it might be my favorite chapter so far.

Code notebook: github.com/rasbt/reason...

(And it should be added to the early access soon.)

The next chapter adds stability and performance improvements to GRPO.

January 18, 2026 at 2:58 PM

Reposted by Sebastian Raschka (rasbt)

Justin Norman

@justintime.ai

For the past month or so, I've been slowly working through this book by @sebastianraschka.com which theoretically and practically builds a GPT model from scratch. Highly recommended!

Ironically, I'm writing much more code by hand as a result

January 7, 2026 at 4:55 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Uploaded my State of LLMs 2025 report for this year:
magazine.sebastianraschka.com/p/state-of-l...

I planned to just write a brief overview, but yeah, it was an eventful year so it was impossible to keep it below 7000 words :D.

The State Of LLMs 2025: Progress, Progress, and Predictions

A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.

magazine.sebastianraschka.com

December 30, 2025 at 4:22 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

One of the underrated papers this year:
"Small Batch Size Training for Language Models:
When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful" (arxiv.org/abs/2507.07101)

(I can confirm this holds for RLVR, too! I have some experiments to share soon.)

December 29, 2025 at 3:52 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I think of it as this: LLMs lower the barrier of entry, and they make coders (beginners and experts) more productive.
It's still worth investing in becoming an expert, because then you will get even more out of LLMs and will be able to deliver even better results.

December 28, 2025 at 4:03 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

The LLM eras:

202x Pre-training (foundation)
2022 RLHF + PPO
2023 LoRA SFT
2024 Mid-Training
2025 RLVR + GRPO
2026 Inference-time scaling?
2027 Continual learning?

December 22, 2025 at 3:40 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just updated the Big LLM Architecture Comparison article...
...it grew quite a bit since the initial version in July 2025, more than doubled!
magazine.sebastianraschka.com/p/the-big-ll...

December 13, 2025 at 2:22 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Hold on a sec, Mistral 3 Large uses the DeepSeek V3 architecture, including MLA?

Just went through the config files; the only difference I could see is that Mistral 3 Large used 2x fewer experts but made each expert 2x large.

December 12, 2025 at 7:14 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Excited for my first conference in Europe in April. I’ll be talking about LLMs, Python, coding, and all the fun stuff, and I’m looking forward to meeting fellow AI builders there!

PyCon DE & PyData @pyconde.bsky.social · Dec 4

We’re thrilled to welcome Sebastian Raschka to PyCon DE & PyData 2026.

From Scratch to Scale: How Far Python Takes You in Building LLMs
A deep dive into how Python powers experimentation, training, and scaling of LLMs.

📍 Darmstadt, April 14–16, 2026

December 5, 2025 at 4:21 AM

Sebastian Raschka (rasbt)

@sebastianraschka.com

This interesting week started with DeepSeek V3.2!

I just wrote up a technical tour of the predecessors and components that led up to this:

🔗 magazine.sebastianraschka.com/p/technical-...

- Multi-Head Latent Attention
- RLVR
- Sparse Attention
- Self-Verification
- GRPO Updates

A Technical Tour of the DeepSeek Models from V3 to V3.2

Understanding How DeepSeek's Flagship Open-Weight Models Evolved

magazine.sebastianraschka.com

December 3, 2025 at 2:51 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Looks like we got a new DeepSeek model over the holidays (again): github.com/deepseek-ai/...

Basically pushes RLVR & self-refinement to gold-level scores on IMO 2025.

Coincidentally, I am currently working on a chapter on self-refinement, and this comes in handy as a nice, scaled-up case study.

November 29, 2025 at 3:11 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Lots of interesting LLM releases last week. My fav was actually Olmo 3 (I love the Olmo series due to their full open-sourceness and transparency).
If you are interested in reading through the architecture details, I coded it from scratch here: github.com/rasbt/LLMs-f...

November 23, 2025 at 2:31 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Inference-scaling lets us trade extra compute for better modeling accuracy. Next to RL, it has become one of the most important concepts in today's LLMs, so the book will cover it in two chapters instead of just one.

If you are looking for sth to read this weekend Ch4 is available now: mng.bz/Dwra

November 20, 2025 at 2:44 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

What should we focus on, (more) LLM training or inference scaling? (A question I got asked multiple times now, so here are some thoughts.)

Training is usually very, very expensive, but it is a one-time cost. Inference-scaling is comparatively cheap, but it's a cost we pay at each query.

November 18, 2025 at 4:29 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

My "The Building Blocks of Today’s and Tomorrow’s Language Models" talk at the PyTorch Conference is now up on YouTube! youtube.com/watch?v=nDl6...

The silver lining of my late arrival and rescheduling: There was no talk after mine, it's followed by a 30 min Q&A instead of just the usual 5 :)

The Building Blocks of Today’s and Tomorrow’s Language Models - Sebastian Raschka, RAIR Lab

YouTube video by PyTorch

youtube.com

November 8, 2025 at 2:01 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I just saw the Kimi K2 Thinking release!

Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.

In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.

November 6, 2025 at 7:35 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

My new field guide to alternatives to standard LLMs:

Gated DeltaNet hybrids (Qwen3-Next, Kimi Linear), text diffusion, code world models, and small reasoning transformers.

🔗 magazine.sebastianraschka.com/p/beyond-sta...

November 4, 2025 at 2:49 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just saw the benchmarks of the new open-weight MiniMax-M2 LLM, and the performance is too good to ignore :). So, I just amended my "The Big LLM Architecture Comparison" with entry number 13!

Link to the full article: magazine.sebastianraschka.com/p/the-big-ll...