Tom Hipwell
banner
tomhipwell.co
Tom Hipwell
@tomhipwell.co
VP Engineering at nory.ai. Past roles: Deel/Hofy, Bulb, JPMorgan. Learning. Shipping.
I've been meaning to write a blog post a bit like this one for ages, but this is much better than I would have done. I think the author is right, private evals are very important and this post gives a good framework of how to design your own -> thundergolfer.com/blog/private...
You should have private evals
Everybody should have a personal set of test prompts to try on LLMs.
thundergolfer.com
May 9, 2025 at 9:09 PM
Always really like Alex's post and I feel like this one nails it again: "the dynamic that matters is not a timeline of capacities but a timeline of accuracies" if you're not already maxed out with AI predictions then this one is worth a read
New blog post: I'm publishing a realistic AI timeline.
A fully speculative exercise going all the way till AGI but still grounded into current research and the coming end of pretraining as we know it. And with a radical different set of premises. vintagedata.org/blog/posts/r...
April 13, 2025 at 8:06 PM
Firebase studio looks quite fun butttt: "To block the use of your prompts and responses for model training, do not use the App Prototyping agent, and do not use Gemini in Firebase within Firebase Studio. To block the use of your code for model training, turn off code completion and code indexing..."
April 11, 2025 at 8:24 PM
Blog post to try and tie together a few themes I've been reading about this week -> tomhipwell.co/blog/cursor_...
Cursor rules, prompt injections, voice to text and Diane | Tom Hipwell
Learning in the open | Tom Hipwell
tomhipwell.co
March 23, 2025 at 9:22 PM
Not sure if this is obvious but probably the most important evaluation criteria for any AI tool at the moment is the ability to choose your own model (and bring your own API key or OpenAI compatible API)
March 12, 2025 at 9:56 PM
This is a great post, worth reading. I re-blogged it here to try and summarise the reasoning -> tomhipwell.co/blog/the_mod...
March 2, 2025 at 6:27 PM
Reposted by Tom Hipwell
Claude 3.7 Sonnet
February 24, 2025 at 6:57 PM
Product teams talking too much about ICPs is a red flag. ICPs are for sales and marketing teams. They need narrow focus to max win rate. Product teams need to understand ICP for prioritisation, but think in PMF strength across segments. Peripheral vision. This is how you expand PMF and win more.
February 21, 2025 at 1:05 PM
Another day, another AI dev flow. There’s some common patterns emerging now (using markdown files like spec.md etc.). This blog gives a step by step guide and prompts to borrow. The advice reduces to “spend a lot of time planning with reasoning models up front” -> harper.blog/2025/02/16/m...
My LLM codegen workflow atm
A detailed walkthrough of my current workflow for using LLms to build software, from brainstorming through planning and execution.
harper.blog
February 17, 2025 at 9:23 PM
Reposted by Tom Hipwell
Will GenAI mean the end of software engineering?

Really thoughtful take from @chiphuyen.bsky.social in The Pragmatic Engineer Podcast:

Perhaps software engineering will change, like writing changed hundreds of years ago thanks to printing

Full: www.youtube.com/watch?v=98o_...
February 7, 2025 at 4:27 PM
Really enjoyed Dario's blog post, I thought it had a bunch of interesting, verifiable predictions (I'm less interested in the export control stuff). It'll be interesting to see if they come good over the course of '25. Worth a read, cuts through the noise -> darioamodei.com/on-deepseek-...
Dario Amodei — On DeepSeek and Export Controls
On DeepSeek and Export Controls
darioamodei.com
January 29, 2025 at 9:43 PM
Great read, useful survey of the field.
I probably don’t need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, it’s been amazing to watch; some of the things that I always dreamed about actually seem to be happening.

For me, there are three big stories: itcanthink.substack.com/p/2024-robot...
2024 Robotics Year in Review
Robotics finally feels like it's happening
itcanthink.substack.com
January 2, 2025 at 8:55 PM
Good list, would be good to hear additions from #dataBS, I would add "what are embeddings" by @vickiboykis.com -> www.latent.space/p/2025-papers
The 2025 AI Engineering Reading List
We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.
www.latent.space
January 2, 2025 at 7:42 PM
I like @simonwillison.net definition of slop, but we're missing a word to describe an algorithm that pushes low grade content. I'd propose gruel, e.g my LinkedIn feed is all gruel now. Netflix recommends gruel all the time. Spotify's playlists are stuffed full of gruel. You get the picture.
December 31, 2024 at 8:18 PM
If you can solve this puzzle, then do not despair! You are still much smarter than o3
December 24, 2024 at 9:14 AM
Great post, a starting point for something that has been confusing for a while
The AI agent spectrum
Separating different classes of AI agents from a long history of reinforcement learning.
Why we can be optimistic for AI agents but also extremely critical of the terrible communications around them to date.
Plus, some policy guidance.
The AI Agent Spectrum
Separating different classes of AI agents from a long history of reinforcement learning.
buff.ly
December 18, 2024 at 5:18 PM
webdev arena is a _really_ fast way to get a feel for the coding abilities of the different models out there. Worth five minutes of your time to design a tricky prompt, then quickly assess each model generation as they get released -> web.lmarena.ai
WebDev Arena
WebDev Arena: AI Battle to build the best website
web.lmarena.ai
December 17, 2024 at 12:27 PM
Interesting ideas
The future of AI is models that generate graphical interfaces. Instead of the linear, low-bandwidth metaphor of conversation, models will represent themselves to us as computers: rich visuals, direct manipulation, and instant feedback.

willwhitney.com/computing-in...
Computing inside an AI | Will Whitney
willwhitney.com
December 14, 2024 at 9:42 AM
I've had this post sat in my drafts for well over 6 months now, but with the release of Sora to GA yesterday I thought I'd share it - Sora: An Idiot's Guide.

tomhipwell.co/blog/sora/
Sora: An idiot's guide | Tom Hipwell
Just what is latent space anyway?
tomhipwell.co
December 10, 2024 at 12:50 PM
Reposted by Tom Hipwell
Reinforcement Learning: An Overview

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based methods, and various other topics.

arxiv.org/abs/2412.05265
December 9, 2024 at 8:37 AM
I had a play about with genini-1206 last night, trying some prompts into both o1 and 1206 in parallel, both models are crackers. Vibes for me are better with o1, feels a bit more succinct and gets to the point quicker. Early days of course.
Wow. Can't believe it's one year after our first Gemini model release. Today, we are "crushing it" according to LMSYS leaderboard.🥳

The latest 1206 release is #1 in ALL categories. You can try it here: aistudio.google.com/app/prompts/...
December 8, 2024 at 8:52 AM
Reposted by Tom Hipwell
Alibaba Qwen team just released the base models for Qwen2-VL. You can also wait for them to release Qwen2.5-VL, which should be sooner than later.

2B: huggingface.co/Qwen/Qwen2-V...
7B: huggingface.co/Qwen/Qwen2-V...
72B: huggingface.co/Qwen/Qwen2-V...
Qwen/Qwen2-VL-2B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
December 7, 2024 at 12:06 AM
Reposted by Tom Hipwell
A test of how seriously your firm is taking AI: when o-1 (& the new Gemini model) came out this week, were there assigned folks who immediately ran the model through your internal, validated, firm-specific benchmarks to see how useful it as? Did you update any plans or goals as a result?
December 7, 2024 at 4:34 PM
Some great ideas here, worth your time to explore
🐟 some musings on how we might use LLMs
🐠 to interact with text at multiple levels of abstraction
🐡 inspired by the fish-eye lens
December 6, 2024 at 9:29 PM
Reposted by Tom Hipwell
First new post in a couple of weeks! There's been a lot of activity around regattastorage.com this week, so I decided to write about the space. tl;dr It's pretty exciting!
The Quest for a Distributed POSIX-Compatible Filesystem
Distributed POSIX filesystems have proven elusive, but we're getting closer. Perhaps that's all we need.
materializedview.io
December 5, 2024 at 8:36 PM