Giles Thomas
banner
gilesthomas.com
Giles Thomas
@gilesthomas.com
On sabbatical / created @PythonAnywhere.com, which found a home at @anacondainc.bsky.social / XP / Python / PSF Fellow / opinions my own / blog at https://www.gilesthomas.com
Reposted by Giles Thomas
So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...
Writing an LLM from scratch, part 27 -- what's left, and what's next?
Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done
www.gilesthomas.com
November 4, 2025 at 12:52 AM
So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...
Writing an LLM from scratch, part 27 -- what's left, and what's next?
Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done
www.gilesthomas.com
November 4, 2025 at 12:52 AM
Reposted by Giles Thomas
The end of the beginning... running evals on our model using Llama 3 is the last part of the main body of @sebastianraschka.com's "Build an LLM (from scratch)". Here's my writeup:

www.gilesthomas.com/2025/11/llm-...
Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model
Coming to the end of 'Build an LLM (from scratch)'! We evaluate the quality of the responses our model produces.
www.gilesthomas.com
November 3, 2025 at 7:43 PM
The end of the beginning... running evals on our model using Llama 3 is the last part of the main body of @sebastianraschka.com's "Build an LLM (from scratch)". Here's my writeup:

www.gilesthomas.com/2025/11/llm-...
Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model
Coming to the end of 'Build an LLM (from scratch)'! We evaluate the quality of the responses our model produces.
www.gilesthomas.com
November 3, 2025 at 7:43 PM
Reposted by Giles Thomas
Back on track with chapter 7 of "Build an LLM (from scratch)": notes on instruction fine-tuning of our GPT-2 model:

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 25 -- instruction fine-tuning
Some notes on the first part of chapter 7 of 'Build an LLM (from scratch)': instruction fine-tuning
www.gilesthomas.com
October 29, 2025 at 9:07 PM
Back on track with chapter 7 of "Build an LLM (from scratch)": notes on instruction fine-tuning of our GPT-2 model:

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 25 -- instruction fine-tuning
Some notes on the first part of chapter 7 of 'Build an LLM (from scratch)': instruction fine-tuning
www.gilesthomas.com
October 29, 2025 at 9:07 PM
Reposted by Giles Thomas
Back when I started messing with LLMs, it looked to me like you could get reasonably OK results for chat applications without instruction fine-tuning. So before getting into Chapter 7 of "Build an LLM (from scratch)", I decided to see if that was really true:

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 24 -- the transcript hack
Back when I started playing with LLMs, I found that you could build a (very basic) chatbot with a base model -- no instruction fine-tuning at all! Does that work with GPT-2?
www.gilesthomas.com
October 28, 2025 at 8:20 PM
Back when I started messing with LLMs, it looked to me like you could get reasonably OK results for chat applications without instruction fine-tuning. So before getting into Chapter 7 of "Build an LLM (from scratch)", I decided to see if that was really true:

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 24 -- the transcript hack
Back when I started playing with LLMs, I found that you could build a (very basic) chatbot with a base model -- no instruction fine-tuning at all! Does that work with GPT-2?
www.gilesthomas.com
October 28, 2025 at 8:20 PM
Reposted by Giles Thomas
And the next step -- a code walkthrough of my PyTorch version of Karpathy's 2015-vintage RNNs.

www.gilesthomas.com/2025/10/retr...
Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch
Revisiting Karpathy’s text-generating RNNs with PyTorch’s built-in LSTM class — a practical look at why training sequence models is so different from Transformers.
www.gilesthomas.com
October 24, 2025 at 6:57 PM
And the next step -- a code walkthrough of my PyTorch version of Karpathy's 2015-vintage RNNs.

www.gilesthomas.com/2025/10/retr...
Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch
Revisiting Karpathy’s text-generating RNNs with PyTorch’s built-in LSTM class — a practical look at why training sequence models is so different from Transformers.
www.gilesthomas.com
October 24, 2025 at 6:57 PM
Reposted by Giles Thomas
Chapter 6 was easy and fun! Fine-tuning an LLM for classification tasks, with some initially disappointing results -- but it all came out in the wash: www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 23 -- fine-tuning for classification
After all the hard work, chapter 6 in 'Build an LLM (from scratch)' is a nice easy one -- how do we take a next-token predictor and turn it into a classifier?
www.gilesthomas.com
October 22, 2025 at 11:06 PM
Chapter 6 was easy and fun! Fine-tuning an LLM for classification tasks, with some initially disappointing results -- but it all came out in the wash: www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 23 -- fine-tuning for classification
After all the hard work, chapter 6 in 'Build an LLM (from scratch)' is a nice easy one -- how do we take a next-token predictor and turn it into a classifier?
www.gilesthomas.com
October 22, 2025 at 11:06 PM
Reposted by Giles Thomas
Part 22 is live: we finally train the LLM :-) Following @sebastianraschka.com's book, we train on Edith Wharton, then swap in GPT-2 (124M) weights for comparison. Notes on seeding, AdamW, temperature and top-k.

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 22 -- finally training our LLM!
Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.
www.gilesthomas.com
October 15, 2025 at 11:45 PM
Part 22 is live: we finally train the LLM :-) Following @sebastianraschka.com's book, we train on Edith Wharton, then swap in GPT-2 (124M) weights for comparison. Notes on seeding, AdamW, temperature and top-k.

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 22 -- finally training our LLM!
Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.
www.gilesthomas.com
October 15, 2025 at 11:45 PM
Reposted by Giles Thomas
Next up in my LLM from scratch series, some serious yak shaving on something @sebastianraschka.com covers in a sidebar: perplexity.

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 21 -- perplexed by perplexity
Raschka calls out perplexity in a sidebar, but I wanted to understand it in a little more depth
www.gilesthomas.com
October 7, 2025 at 7:06 PM
Next up in my LLM from scratch series, some serious yak shaving on something @sebastianraschka.com covers in a sidebar: perplexity.

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 21 -- perplexed by perplexity
Raschka calls out perplexity in a sidebar, but I wanted to understand it in a little more depth
www.gilesthomas.com
October 7, 2025 at 7:06 PM
Reposted by Giles Thomas
Back to the main track of my LLM from scratch posts: cross entropy -- what it is and why we use it. www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss
Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?
www.gilesthomas.com
October 2, 2025 at 9:15 PM
Back to the main track of my LLM from scratch posts: cross entropy -- what it is and why we use it. www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss
Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?
www.gilesthomas.com
October 2, 2025 at 9:15 PM
Reposted by Giles Thomas
Part 3, and this was a fun one to write: How do LLMs work? From token IDs through to logits -- projections, matrix multiplications, and attention step-by-step.

www.gilesthomas.com/2025/09/how-...
How do LLMs work?
What actually goes on inside an LLM to make it calculate probabilities for the next token?
www.gilesthomas.com
September 15, 2025 at 10:49 PM
Part 3, and this was a fun one to write: How do LLMs work? From token IDs through to logits -- projections, matrix multiplications, and attention step-by-step.

www.gilesthomas.com/2025/09/how-...
How do LLMs work?
What actually goes on inside an LLM to make it calculate probabilities for the next token?
www.gilesthomas.com
September 15, 2025 at 10:49 PM
Got some useful feedback on that post over the weekend; addendum here: www.gilesthomas.com/2025/09/math...
The maths you need to start understanding LLMs (addendum)
Clarifications and a new section on the dot product, updating my refresher on the maths behind LLMs.
www.gilesthomas.com
September 8, 2025 at 5:20 PM
Thanks!
September 6, 2025 at 1:13 PM
Reposted by Giles Thomas
Part 2: "The maths you need to start understanding LLMs":

www.gilesthomas.com/2025/09/math...
The maths you need to start understanding LLMs
A quick refresher on the maths behind LLMs: vectors, matrices, projections, embeddings, logits and softmax.
www.gilesthomas.com
September 2, 2025 at 11:11 PM
Part 2: "The maths you need to start understanding LLMs":

www.gilesthomas.com/2025/09/math...
The maths you need to start understanding LLMs
A quick refresher on the maths behind LLMs: vectors, matrices, projections, embeddings, logits and softmax.
www.gilesthomas.com
September 2, 2025 at 11:11 PM