Lightnews — Scholar-powered news

Light up
your news

Create account Sign in

About Privacy Terms Help

Giles Thomas

Giles Thomas

@gilesthomas.com

200 followers 66 following 93 posts

On sabbatical / created @PythonAnywhere.com, which found a home at @anacondainc.bsky.social / XP / Python / PSF Fellow / opinions my own / blog at https://www.gilesthomas.com

Posts Replies Media Videos

Pinned

Giles Thomas @gilesthomas.com · 7d

So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...

Writing an LLM from scratch, part 27 -- what's left, and what's next?

Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done

www.gilesthomas.com

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

So, what's left to do in my series on building an LLM from scratch? And what follow-up series should I work on? Some musings: www.gilesthomas.com/2025/11/llm-...

Writing an LLM from scratch, part 27 -- what's left, and what's next?

Having finished the main body of 'Build an LLM (from scratch)', it's time to think about what I need to do to treat this project as fully done

www.gilesthomas.com

November 4, 2025 at 12:52 AM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

The end of the beginning... running evals on our model using Llama 3 is the last part of the main body of @sebastianraschka.com's "Build an LLM (from scratch)". Here's my writeup:

www.gilesthomas.com/2025/11/llm-...

Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model

Coming to the end of 'Build an LLM (from scratch)'! We evaluate the quality of the responses our model produces.

www.gilesthomas.com

November 3, 2025 at 7:43 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Back on track with chapter 7 of "Build an LLM (from scratch)": notes on instruction fine-tuning of our GPT-2 model:

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 25 -- instruction fine-tuning

Some notes on the first part of chapter 7 of 'Build an LLM (from scratch)': instruction fine-tuning

www.gilesthomas.com

October 29, 2025 at 9:07 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Back when I started messing with LLMs, it looked to me like you could get reasonably OK results for chat applications without instruction fine-tuning. So before getting into Chapter 7 of "Build an LLM (from scratch)", I decided to see if that was really true:

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 24 -- the transcript hack

Back when I started playing with LLMs, I found that you could build a (very basic) chatbot with a base model -- no instruction fine-tuning at all! Does that work with GPT-2?

www.gilesthomas.com

October 28, 2025 at 8:20 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

And the next step -- a code walkthrough of my PyTorch version of Karpathy's 2015-vintage RNNs.

www.gilesthomas.com/2025/10/retr...

Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch

Revisiting Karpathy’s text-generating RNNs with PyTorch’s built-in LSTM class — a practical look at why training sequence models is so different from Transformers.

www.gilesthomas.com

October 24, 2025 at 6:57 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Chapter 6 was easy and fun! Fine-tuning an LLM for classification tasks, with some initially disappointing results -- but it all came out in the wash: www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 23 -- fine-tuning for classification

After all the hard work, chapter 6 in 'Build an LLM (from scratch)' is a nice easy one -- how do we take a next-token predictor and turn it into a classifier?

www.gilesthomas.com

October 22, 2025 at 11:06 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Part 22 is live: we finally train the LLM :-) Following @sebastianraschka.com's book, we train on Edith Wharton, then swap in GPT-2 (124M) weights for comparison. Notes on seeding, AdamW, temperature and top-k.

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 22 -- finally training our LLM!

Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.

www.gilesthomas.com

October 15, 2025 at 11:45 PM

Giles Thomas

@gilesthomas.com

Decided to learn about some "retro" language models in parallel with LLMs; here's my first post on that -- Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'.

www.gilesthomas.com/2025/10/revi...

Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'

Andrej Karpathy's 2015 blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks' went viral in its day, for good reason. How does it read ten years later?

www.gilesthomas.com

October 11, 2025 at 1:01 AM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Next up in my LLM from scratch series, some serious yak shaving on something @sebastianraschka.com covers in a sidebar: perplexity.

www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 21 -- perplexed by perplexity

Raschka calls out perplexity in a sidebar, but I wanted to understand it in a little more depth

www.gilesthomas.com

October 7, 2025 at 7:06 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Back to the main track of my LLM from scratch posts: cross entropy -- what it is and why we use it. www.gilesthomas.com/2025/10/llm-...

Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss

Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?

www.gilesthomas.com

October 2, 2025 at 9:15 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Part 3, and this was a fun one to write: How do LLMs work? From token IDs through to logits -- projections, matrix multiplications, and attention step-by-step.

www.gilesthomas.com/2025/09/how-...

How do LLMs work?

What actually goes on inside an LLM to make it calculate probabilities for the next token?

www.gilesthomas.com

September 15, 2025 at 10:49 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Part 2: "The maths you need to start understanding LLMs":

www.gilesthomas.com/2025/09/math...

The maths you need to start understanding LLMs

A quick refresher on the maths behind LLMs: vectors, matrices, projections, embeddings, logits and softmax.

www.gilesthomas.com

September 2, 2025 at 11:11 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Part 1: "What AI chatbots are actually doing under the hood".

www.gilesthomas.com/2025/08/what...

What AI chatbots are actually doing under the hood

How AI chatbots like ChatGPT work under the hood -- the post I wish I’d found before starting 'Build a Large Language Model (from Scratch)'.

www.gilesthomas.com

August 29, 2025 at 7:05 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

I wanted to do a "what I've learned so far" post to wrap up my notes on Chapter 4 of @sebastianraschka.com's "Build an LLM (from scratch)" but when it got to 6,000 words I suspected it was getting a bit long. So here's a review of upcoming attractions: www.gilesthomas.com/2025/08/llm-...

Writing an LLM from scratch, part 19 -- wrapping up Chapter 4

A state-of-play update after finishing Chapter 4 of 'Build a Large Language Model from Scratch', with a roadmap of what’s coming next

www.gilesthomas.com

August 29, 2025 at 7:04 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Now it's time to look into shortcut connections in @sebastianraschka.com's "Build an LLM (from scratch)" -- where the Talmud becomes a surprisingly useful metaphor!

www.gilesthomas.com/2025/08/llm-...

Writing an LLM from scratch, part 18 -- residuals, shortcut connections, and the Talmud

The book's description of shortcut connections is overly-simplified, I think -- residuals are more than just shortcuts to help with gradients

www.gilesthomas.com

August 18, 2025 at 7:24 PM

Giles Thomas

@gilesthomas.com

Some thinking in public: if the FFNs in an LLM only work on one context vector at a time, why aren't they bitten by the fixed-length bottleneck? www.gilesthomas.com/2025/08/the-...

The fixed length bottleneck and the feed forward network

The feed-forward network in an LLM processes context vectors one at a time. This feels like it would cause similar issues to the old fixed-length bottleneck, even though it almost certainly does not.

www.gilesthomas.com

August 14, 2025 at 10:49 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

After a summer break, it's on to the feed-forward layer in @sebastianraschka.com's "Build an LLM (from scratch)" -- in which I discover that attention is not all you need.

www.gilesthomas.com/2025/08/llm-...

Writing an LLM from scratch, part 17 -- the feed-forward network

The feed-forward network is one of the easiest parts of an LLM in terms of implementation -- but when I thought about it I realised it was one of the most important.

www.gilesthomas.com

August 12, 2025 at 10:07 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

The next step in my slow-but-steady progress through @sebastianraschka.com's "Build an LLM (from scratch)" -- what is layer normalisation, why do we do it, and how does it work?

www.gilesthomas.com/2025/07/llm-...

Writing an LLM from scratch, part 16 -- layer normalisation

Working through layer normalisation -- why do we do it, how does it work, and why doesn't it break everything?

www.gilesthomas.com

July 8, 2025 at 7:15 PM

Reposted by Giles Thomas

PythonAnywhere

@pythonanywhere.com

Connect your AI coding assistant directly to PythonAnywhere! It works with Claude, Copilot, Cursor, and any MCP-compatible tool. See github.com/pythonanywhe...

GitHub - pythonanywhere/pythonanywhere-mcp-server

Contribute to pythonanywhere/pythonanywhere-mcp-server development by creating an account on GitHub.

July 8, 2025 at 1:59 PM

Giles Thomas

@gilesthomas.com

New blog post, just documenting a bit of cruft-removal: porting old Fabric3 code to modern Fabric: www.gilesthomas.com/2025/06/fabr...

Moving from Fabric3 to Fabric

A couple of lessons learned in moving from Fabric3 to Fabric

www.gilesthomas.com

June 15, 2025 at 12:35 AM

Giles Thomas

@gilesthomas.com

After 14 years of leading @pythonanywhere.com -- 11 as an independent company and 3 as part of @anacondainc.bsky.social, it's time for me to take a much-needed break, and today was my last day. It's been an amazing ride and I'd like to thank everyone that made it possible.

June 5, 2025 at 6:13 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Moving onwards with @sebastianraschka.com's "Build an LLM (from scratch)". The last step in an LLM, going from context vectors to next-word prediction, looks insanely simple. How can it possibly work?

www.gilesthomas.com/2025/05/llm-...

Writing an LLM from scratch, part 15 -- from context vectors to logits; or, can it really be that simple?!

The way we get from context vectors to next-word prediction turns out to be simpler than I imagined -- but understanding why it works took a bit of thought.

www.gilesthomas.com

May 31, 2025 at 11:24 PM

Giles Thomas

@gilesthomas.com

First impressions of Claude 4: vibes seem fixed. 3.5 was likeable, 3.7 less so. So far this seems like a return to form.

May 22, 2025 at 8:40 PM

Reposted by Giles Thomas

PythonAnywhere

@pythonanywhere.com

We've got Streamlit working on PythonAnywhere with our experimental website system!
help.pythonanywhere.com/pages/Stream... --
The bad news is that you need a paid account, Streamlit uses up a *lot* of disk space :-(

Deploying Streamlit apps on PythonAnywhere (beta)

This help page explains how to set up a Streamlit app on PythonAnywhere. Disclaimer Deployment of Streamlit apps on PythonAnywhere is an experimental feature. Some important limitations to know about

help.pythonanywhere.com

May 7, 2025 at 3:37 PM

Reposted by Giles Thomas

Giles Thomas

@gilesthomas.com

Some thoughts (and wild speculation) about scaling attention in LLMs, in my new post working through @sebastianraschka.com's "Build a Large Language Model (from Scratch)"

www.gilesthomas.com/2025/05/llm-...

Writing an LLM from scratch, part 14 -- the complexity of self-attention at scale

A pause to take stock: starting to build intuition on how self-attention scales (and why the simple version doesn't)

www.gilesthomas.com

May 14, 2025 at 8:27 PM