Lightnews — Scholar-powered news

Sebastian Raschka (rasbt)

@sebastianraschka.com

9.7K followers 240 following 280 posts

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://amzn.to/4fqvn0D) & reasoning (https://mng.bz/Nwr7).

Also blogging about AI research at magazine.sebastianraschka.com.

Posts Replies Media Videos

Sebastian Raschka (rasbt)

@sebastianraschka.com

Awesome, I am glad to hear this!

November 11, 2025 at 10:14 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Happy reading and coding! Regarding the calculus part, I do have something here :)

sebastianraschka.com/pdf/books/dl...

sebastianraschka.com

November 11, 2025 at 1:10 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

currently halfway done with chapter 4 😁

November 5, 2025 at 11:32 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

I like trying new and different things. I don’t have as many convos here, but those I had were quite insightful, so I am hoping for more of that!

November 5, 2025 at 3:25 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

haha you never now on the internet these days. But joking aside, I did update the article with MiniMax M2 and Kimi Linear last week :)
magazine.sebastianraschka.com/p/the-big-ll...

The Big LLM Architecture Comparison

From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

magazine.sebastianraschka.com

November 2, 2025 at 2:09 AM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Had that on my list for this wknd!

October 31, 2025 at 2:19 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

This is from a PyTorch developer’s perspective. For that, it’s great so far!

October 31, 2025 at 2:18 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Kimi came 2 days later 😆

October 31, 2025 at 2:17 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

ha, very timely! Just got back from the conference and haven't had a chance to read the M2 report. But based on the Model Hub, it seems that SWA is not the default (similar to the recent Mistral Models) 🤔
(Source: huggingface.co/MiniMaxAI/Mi...)

October 27, 2025 at 6:11 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

there are trade-offs, but I find it refreshing that people are working on this :)

October 27, 2025 at 3:53 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

You could, but I would not recommend 😅
bsky.app/profile/seba...

Sebastian Raschka (rasbt) @sebastianraschka.com · Oct 11

Yes, if you set n_kv_groups = n_heads then it's multi-query attention. But it's not recommended as it too much of an extreme case and results in poor modeling performance. I don't know any LLM using it.

October 11, 2025 at 5:16 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

October 11, 2025 at 2:04 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

But keep in mind that they are special-purpose models, not text models. But that's also the whole selling point: small special purpose models instead of LLMs.

October 9, 2025 at 6:09 PM

Sebastian Raschka (rasbt)

@sebastianraschka.com

Interesting! I am not sure it will be a concatenation of models (I can't see how it would work), but I can see using modules like this that a generalist model can call for specific problems.

October 9, 2025 at 5:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news