Lightnews — Scholar-powered news

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...

Writing an LLM from scratch, part 27 -- what's left, and what's next?

Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done

www.gilesthomas.com

November 4, 2025 at 12:52 AM

Giles Thomas

@gilesthomas.com

So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...

Writing an LLM from scratch, part 27 -- what's left, and what's next?

Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done

www.gilesthomas.com

November 4, 2025 at 12:52 AM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

The end of the beginning... running evals on our model using Llama 3 is the last part of the main body of @sebastianraschka.com's "Build an LLM (from scratch)". Here's my writeup:

www.gilesthomas.com/2025/11/llm-...

Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model

Coming to the end of 'Build an LLM (from scratch)'! We evaluate the quality of the responses our model produces.

www.gilesthomas.com

November 3, 2025 at 7:43 PM

Giles Thomas

@gilesthomas.com

The end of the beginning... running evals on our model using Llama 3 is the last part of the main body of @sebastianraschka.com's "Build an LLM (from scratch)". Here's my writeup:

www.gilesthomas.com/2025/11/llm-...

Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model

Coming to the end of 'Build an LLM (from scratch)'! We evaluate the quality of the responses our model produces.

www.gilesthomas.com

November 3, 2025 at 7:43 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Back on track with chapter 7 of "Build an LLM (from scratch)": notes on instruction fine-tuning of our GPT-2 model:

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 25 -- instruction fine-tuning

Some notes on the first part of chapter 7 of 'Build an LLM (from scratch)': instruction fine-tuning

www.gilesthomas.com

October 29, 2025 at 9:07 PM

Giles Thomas

@gilesthomas.com

Back on track with chapter 7 of "Build an LLM (from scratch)": notes on instruction fine-tuning of our GPT-2 model:

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 25 -- instruction fine-tuning

Some notes on the first part of chapter 7 of 'Build an LLM (from scratch)': instruction fine-tuning

www.gilesthomas.com

October 29, 2025 at 9:07 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Back when I started messing with LLMs, it looked to me like you could get reasonably OK results for chat applications without instruction fine-tuning. So before getting into Chapter 7 of "Build an LLM (from scratch)", I decided to see if that was really true:

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 24 -- the transcript hack

Back when I started playing with LLMs, I found that you could build a (very basic) chatbot with a base model -- no instruction fine-tuning at all! Does that work with GPT-2?

www.gilesthomas.com

October 28, 2025 at 8:20 PM

Giles Thomas

@gilesthomas.com

Back when I started messing with LLMs, it looked to me like you could get reasonably OK results for chat applications without instruction fine-tuning. So before getting into Chapter 7 of "Build an LLM (from scratch)", I decided to see if that was really true:

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 24 -- the transcript hack

Back when I started playing with LLMs, I found that you could build a (very basic) chatbot with a base model -- no instruction fine-tuning at all! Does that work with GPT-2?

www.gilesthomas.com

October 28, 2025 at 8:20 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

And the next step -- a code walkthrough of my PyTorch version of Karpathy's 2015-vintage RNNs.

www.gilesthomas.com/2025/10/retr...

Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch

Revisiting Karpathy’s text-generating RNNs with PyTorch’s built-in LSTM class — a practical look at why training sequence models is so different from Transformers.

www.gilesthomas.com

October 24, 2025 at 6:57 PM

Giles Thomas

@gilesthomas.com

And the next step -- a code walkthrough of my PyTorch version of Karpathy's 2015-vintage RNNs.

www.gilesthomas.com/2025/10/retr...

Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch

Revisiting Karpathy’s text-generating RNNs with PyTorch’s built-in LSTM class — a practical look at why training sequence models is so different from Transformers.

www.gilesthomas.com

October 24, 2025 at 6:57 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Chapter 6 was easy and fun! Fine-tuning an LLM for classification tasks, with some initially disappointing results -- but it all came out in the wash: www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 23 -- fine-tuning for classification

After all the hard work, chapter 6 in 'Build an LLM (from scratch)' is a nice easy one -- how do we take a next-token predictor and turn it into a classifier?

www.gilesthomas.com

October 22, 2025 at 11:06 PM

Giles Thomas

@gilesthomas.com

Chapter 6 was easy and fun! Fine-tuning an LLM for classification tasks, with some initially disappointing results -- but it all came out in the wash: www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 23 -- fine-tuning for classification

After all the hard work, chapter 6 in 'Build an LLM (from scratch)' is a nice easy one -- how do we take a next-token predictor and turn it into a classifier?

www.gilesthomas.com

October 22, 2025 at 11:06 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Part 22 is live: we finally train the LLM :-) Following @sebastianraschka.com's book, we train on Edith Wharton, then swap in GPT-2 (124M) weights for comparison. Notes on seeding, AdamW, temperature and top-k.

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 22 -- finally training our LLM!

Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.

www.gilesthomas.com

October 15, 2025 at 11:45 PM

Giles Thomas

@gilesthomas.com

Part 22 is live: we finally train the LLM :-) Following @sebastianraschka.com's book, we train on Edith Wharton, then swap in GPT-2 (124M) weights for comparison. Notes on seeding, AdamW, temperature and top-k.

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 22 -- finally training our LLM!

Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.

www.gilesthomas.com

October 15, 2025 at 11:45 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Next up in my LLM from scratch series, some serious yak shaving on something @sebastianraschka.com covers in a sidebar: perplexity.

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 21 -- perplexed by perplexity

Raschka calls out perplexity in a sidebar, but I wanted to understand it in a little more depth

www.gilesthomas.com

October 7, 2025 at 7:06 PM

Giles Thomas

@gilesthomas.com

Next up in my LLM from scratch series, some serious yak shaving on something @sebastianraschka.com covers in a sidebar: perplexity.

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 21 -- perplexed by perplexity

Raschka calls out perplexity in a sidebar, but I wanted to understand it in a little more depth

www.gilesthomas.com

October 7, 2025 at 7:06 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Back to the main track of my LLM from scratch posts: cross entropy -- what it is and why we use it. www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss

Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?

www.gilesthomas.com

October 2, 2025 at 9:15 PM

Giles Thomas

@gilesthomas.com

Back to the main track of my LLM from scratch posts: cross entropy -- what it is and why we use it. www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss

Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?

www.gilesthomas.com

October 2, 2025 at 9:15 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Part 3, and this was a fun one to write: How do LLMs work? From token IDs through to logits -- projections, matrix multiplications, and attention step-by-step.

www.gilesthomas.com/2025/09/how-...

How do LLMs work?

What actually goes on inside an LLM to make it calculate probabilities for the next token?

www.gilesthomas.com

September 15, 2025 at 10:49 PM

Giles Thomas

@gilesthomas.com

Part 3, and this was a fun one to write: How do LLMs work? From token IDs through to logits -- projections, matrix multiplications, and attention step-by-step.

www.gilesthomas.com/2025/09/how-...

How do LLMs work?

What actually goes on inside an LLM to make it calculate probabilities for the next token?

www.gilesthomas.com

September 15, 2025 at 10:49 PM