Giles Thomas
banner
gilesthomas.com
Giles Thomas
@gilesthomas.com
On sabbatical / created @PythonAnywhere.com, which found a home at @anacondainc.bsky.social / XP / Python / PSF Fellow / opinions my own / blog at https://www.gilesthomas.com
Pinned
So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...
Writing an LLM from scratch, part 27 -- what's left, and what's next?
Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done
www.gilesthomas.com
Reposted by Giles Thomas
So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...
Writing an LLM from scratch, part 27 -- what's left, and what's next?
Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done
www.gilesthomas.com
November 4, 2025 at 12:52 AM
Reposted by Giles Thomas
The end of the beginning... running evals on our model using Llama 3 is the last part of the main body of @sebastianraschka.com's "Build an LLM (from scratch)". Here's my writeup:

www.gilesthomas.com/2025/11/llm-...
Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model
Coming to the end of 'Build an LLM (from scratch)'! We evaluate the quality of the responses our model produces.
www.gilesthomas.com
November 3, 2025 at 7:43 PM
Reposted by Giles Thomas
Back on track with chapter 7 of "Build an LLM (from scratch)": notes on instruction fine-tuning of our GPT-2 model:

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 25 -- instruction fine-tuning
Some notes on the first part of chapter 7 of 'Build an LLM (from scratch)': instruction fine-tuning
www.gilesthomas.com
October 29, 2025 at 9:07 PM
Reposted by Giles Thomas
Back when I started messing with LLMs, it looked to me like you could get reasonably OK results for chat applications without instruction fine-tuning. So before getting into Chapter 7 of "Build an LLM (from scratch)", I decided to see if that was really true:

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 24 -- the transcript hack
Back when I started playing with LLMs, I found that you could build a (very basic) chatbot with a base model -- no instruction fine-tuning at all! Does that work with GPT-2?
www.gilesthomas.com
October 28, 2025 at 8:20 PM
Reposted by Giles Thomas
And the next step -- a code walkthrough of my PyTorch version of Karpathy's 2015-vintage RNNs.

www.gilesthomas.com/2025/10/retr...
Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch
Revisiting Karpathy’s text-generating RNNs with PyTorch’s built-in LSTM class — a practical look at why training sequence models is so different from Transformers.
www.gilesthomas.com
October 24, 2025 at 6:57 PM
Reposted by Giles Thomas
Chapter 6 was easy and fun! Fine-tuning an LLM for classification tasks, with some initially disappointing results -- but it all came out in the wash: www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 23 -- fine-tuning for classification
After all the hard work, chapter 6 in 'Build an LLM (from scratch)' is a nice easy one -- how do we take a next-token predictor and turn it into a classifier?
www.gilesthomas.com
October 22, 2025 at 11:06 PM
Reposted by Giles Thomas
Part 22 is live: we finally train the LLM :-) Following @sebastianraschka.com's book, we train on Edith Wharton, then swap in GPT-2 (124M) weights for comparison. Notes on seeding, AdamW, temperature and top-k.

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 22 -- finally training our LLM!
Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.
www.gilesthomas.com
October 15, 2025 at 11:45 PM
Decided to learn about some "retro" language models in parallel with LLMs; here's my first post on that -- Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'.

www.gilesthomas.com/2025/10/revi...
Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'
Andrej Karpathy's 2015 blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks' went viral in its day, for good reason. How does it read ten years later?
www.gilesthomas.com
October 11, 2025 at 1:01 AM
Reposted by Giles Thomas
Next up in my LLM from scratch series, some serious yak shaving on something @sebastianraschka.com covers in a sidebar: perplexity.

www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 21 -- perplexed by perplexity
Raschka calls out perplexity in a sidebar, but I wanted to understand it in a little more depth
www.gilesthomas.com
October 7, 2025 at 7:06 PM
Reposted by Giles Thomas
Back to the main track of my LLM from scratch posts: cross entropy -- what it is and why we use it. www.gilesthomas.com/2025/10/llm-...
Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss
Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?
www.gilesthomas.com
October 2, 2025 at 9:15 PM
Reposted by Giles Thomas
Part 3, and this was a fun one to write: How do LLMs work? From token IDs through to logits -- projections, matrix multiplications, and attention step-by-step.

www.gilesthomas.com/2025/09/how-...
How do LLMs work?
What actually goes on inside an LLM to make it calculate probabilities for the next token?
www.gilesthomas.com
September 15, 2025 at 10:49 PM
Reposted by Giles Thomas
Part 2: "The maths you need to start understanding LLMs":

www.gilesthomas.com/2025/09/math...
The maths you need to start understanding LLMs
A quick refresher on the maths behind LLMs: vectors, matrices, projections, embeddings, logits and softmax.
www.gilesthomas.com
September 2, 2025 at 11:11 PM
Reposted by Giles Thomas
Part 1: "What AI chatbots are actually doing under the hood".

www.gilesthomas.com/2025/08/what...
What AI chatbots are actually doing under the hood
How AI chatbots like ChatGPT work under the hood -- the post I wish I’d found before starting 'Build a Large Language Model (from Scratch)'.
www.gilesthomas.com
August 29, 2025 at 7:05 PM
Reposted by Giles Thomas
I wanted to do a "what I've learned so far" post to wrap up my notes on Chapter 4 of @sebastianraschka.com's "Build an LLM (from scratch)" but when it got to 6,000 words I suspected it was getting a bit long. So here's a review of upcoming attractions: www.gilesthomas.com/2025/08/llm-...
Writing an LLM from scratch, part 19 -- wrapping up Chapter 4
A state-of-play update after finishing Chapter 4 of 'Build a Large Language Model from Scratch', with a roadmap of what’s coming next
www.gilesthomas.com
August 29, 2025 at 7:04 PM
Reposted by Giles Thomas
Now it's time to look into shortcut connections in @sebastianraschka.com's "Build an LLM (from scratch)" -- where the Talmud becomes a surprisingly useful metaphor!

www.gilesthomas.com/2025/08/llm-...
Writing an LLM from scratch, part 18 -- residuals, shortcut connections, and the Talmud
The book's description of shortcut connections is overly-simplified, I think -- residuals are more than just shortcuts to help with gradients
www.gilesthomas.com
August 18, 2025 at 7:24 PM
Some thinking in public: if the FFNs in an LLM only work on one context vector at a time, why aren't they bitten by the fixed-length bottleneck? www.gilesthomas.com/2025/08/the-...
The fixed length bottleneck and the feed forward network
The feed-forward network in an LLM processes context vectors one at a time. This feels like it would cause similar issues to the old fixed-length bottleneck, even though it almost certainly does not.
www.gilesthomas.com
August 14, 2025 at 10:49 PM
Reposted by Giles Thomas
After a summer break, it's on to the feed-forward layer in @sebastianraschka.com's "Build an LLM (from scratch)" -- in which I discover that attention is not all you need.

www.gilesthomas.com/2025/08/llm-...
Writing an LLM from scratch, part 17 -- the feed-forward network
The feed-forward network is one of the easiest parts of an LLM in terms of implementation -- but when I thought about it I realised it was one of the most important.
www.gilesthomas.com
August 12, 2025 at 10:07 PM
Reposted by Giles Thomas
The next step in my slow-but-steady progress through @sebastianraschka.com's "Build an LLM (from scratch)" -- what is layer normalisation, why do we do it, and how does it work?

www.gilesthomas.com/2025/07/llm-...
Writing an LLM from scratch, part 16 -- layer normalisation
Working through layer normalisation -- why do we do it, how does it work, and why doesn't it break everything?
www.gilesthomas.com
July 8, 2025 at 7:15 PM
Reposted by Giles Thomas
Connect your AI coding assistant directly to PythonAnywhere! It works with Claude, Copilot, Cursor, and any MCP-compatible tool. See github.com/pythonanywhe...
GitHub - pythonanywhere/pythonanywhere-mcp-server
Contribute to pythonanywhere/pythonanywhere-mcp-server development by creating an account on GitHub.
github.com
July 8, 2025 at 1:59 PM
New blog post, just documenting a bit of cruft-removal: porting old Fabric3 code to modern Fabric: www.gilesthomas.com/2025/06/fabr...
Moving from Fabric3 to Fabric
A couple of lessons learned in moving from Fabric3 to Fabric
www.gilesthomas.com
June 15, 2025 at 12:35 AM
After 14 years of leading @pythonanywhere.com -- 11 as an independent company and 3 as part of @anacondainc.bsky.social, it's time for me to take a much-needed break, and today was my last day. It's been an amazing ride and I'd like to thank everyone that made it possible.
June 5, 2025 at 6:13 PM
Reposted by Giles Thomas
Moving onwards with @sebastianraschka.com's "Build an LLM (from scratch)". The last step in an LLM, going from context vectors to next-word prediction, looks insanely simple. How can it possibly work?

www.gilesthomas.com/2025/05/llm-...
Writing an LLM from scratch, part 15 -- from context vectors to logits; or, can it really be that simple?!
The way we get from context vectors to next-word prediction turns out to be simpler than I imagined -- but understanding why it works took a bit of thought.
www.gilesthomas.com
May 31, 2025 at 11:24 PM
First impressions of Claude 4: vibes seem fixed. 3.5 was likeable, 3.7 less so. So far this seems like a return to form.
May 22, 2025 at 8:40 PM
Reposted by Giles Thomas
We've got Streamlit working on PythonAnywhere with our experimental website system!
help.pythonanywhere.com/pages/Stream... --
The bad news is that you need a paid account, Streamlit uses up a *lot* of disk space :-(
Deploying Streamlit apps on PythonAnywhere (beta)
This help page explains how to set up a Streamlit app on PythonAnywhere. Disclaimer Deployment of Streamlit apps on PythonAnywhere is an experimental feature. Some important limitations to know about
help.pythonanywhere.com
May 7, 2025 at 3:37 PM
Reposted by Giles Thomas
Some thoughts (and wild speculation) about scaling attention in LLMs, in my new post working through @sebastianraschka.com's "Build a Large Language Model (from Scratch)"

www.gilesthomas.com/2025/05/llm-...
Writing an LLM from scratch, part 14 -- the complexity of self-attention at scale
A pause to take stock: starting to build intuition on how self-attention scales (and why the simple version doesn't)
www.gilesthomas.com
May 14, 2025 at 8:27 PM