Lightnews — Scholar-powered news

Lon

@ryukn.bsky.social

Software Engineer specializing in Machine Learning in Tokyo, Japan.
Working in the field of AI for science.

Opinions are my own.

Posts Replies Media Videos

Lon

@ryukn.bsky.social

I’m experiencing OoM errors when trying to run the gpt-oss 20b model on Google Colab’s free T4 GPU.
I wonder if there are any good workarounds for this?

August 16, 2025 at 5:14 AM

Lon

@ryukn.bsky.social

Just read this article: magazine.sebastianraschka.com/p/from-gpt-2...
Since I recently tried implementing the GPT-2 architecture from scratch, this article's approach to highlight the differences of gpt-oss from GPT-2 was easy to follow. As the article mentions, GPT-2 is a great starting point.

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

And How They Stack Up Against Qwen3

magazine.sebastianraschka.com

August 11, 2025 at 1:22 PM

Lon

@ryukn.bsky.social

Recently re-implementing GPT-2 from scratch! Before diving in, I only had a vague understanding of how LLMs work-still learning this field. But after hands-on implementation, my grasp of embeddings, multi-head attention, and GPU parallelization became much clearer. Really grateful for the tutorial!

August 2, 2025 at 11:19 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news