LightNews — Scholar-powered news

Tom Hipwell

@tomhipwell.co

I've been meaning to write a blog post a bit like this one for ages, but this is much better than I would have done. I think the author is right, private evals are very important and this post gives a good framework of how to design your own -> thundergolfer.com/blog/private...

You should have private evals

Everybody should have a personal set of test prompts to try on LLMs.

thundergolfer.com

May 9, 2025 at 9:09 PM

Tom Hipwell

@tomhipwell.co

Always really like Alex's post and I feel like this one nails it again: "the dynamic that matters is not a timeline of capacities but a timeline of accuracies" if you're not already maxed out with AI predictions then this one is worth a read

Alexander Doria @dorialexander.bsky.social · Apr 13

New blog post: I'm publishing a realistic AI timeline.
A fully speculative exercise going all the way till AGI but still grounded into current research and the coming end of pretraining as we know it. And with a radical different set of premises. vintagedata.org/blog/posts/r...

April 13, 2025 at 8:06 PM

Tom Hipwell

@tomhipwell.co

Firebase studio looks quite fun butttt: "To block the use of your prompts and responses for model training, do not use the App Prototyping agent, and do not use Gemini in Firebase within Firebase Studio. To block the use of your code for model training, turn off code completion and code indexing..."

April 11, 2025 at 8:24 PM

Tom Hipwell

@tomhipwell.co

Blog post to try and tie together a few themes I've been reading about this week -> tomhipwell.co/blog/cursor_...

Cursor rules, prompt injections, voice to text and Diane | Tom Hipwell

Learning in the open | Tom Hipwell

tomhipwell.co

March 23, 2025 at 9:22 PM

Tom Hipwell

@tomhipwell.co

Not sure if this is obvious but probably the most important evaluation criteria for any AI tool at the moment is the ability to choose your own model (and bring your own API key or OpenAI compatible API)

March 12, 2025 at 9:56 PM

Tom Hipwell

@tomhipwell.co

This is a great post, worth reading. I re-blogged it here to try and summarise the reasoning -> tomhipwell.co/blog/the_mod...

March 2, 2025 at 6:27 PM

Reposted by Tom Hipwell

Sung Kim

@sungkim.bsky.social

Claude 3.7 Sonnet

February 24, 2025 at 6:57 PM

Tom Hipwell

@tomhipwell.co

Product teams talking too much about ICPs is a red flag. ICPs are for sales and marketing teams. They need narrow focus to max win rate. Product teams need to understand ICP for prioritisation, but think in PMF strength across segments. Peripheral vision. This is how you expand PMF and win more.

February 21, 2025 at 1:05 PM

Tom Hipwell

@tomhipwell.co

Another day, another AI dev flow. There’s some common patterns emerging now (using markdown files like spec.md etc.). This blog gives a step by step guide and prompts to borrow. The advice reduces to “spend a lot of time planning with reasoning models up front” -> harper.blog/2025/02/16/m...

My LLM codegen workflow atm

A detailed walkthrough of my current workflow for using LLms to build software, from brainstorming through planning and execution.

harper.blog

February 17, 2025 at 9:23 PM

Reposted by Tom Hipwell

Gergely Orosz

@gergely.pragmaticengineer.com

Will GenAI mean the end of software engineering?

Really thoughtful take from @chiphuyen.bsky.social in The Pragmatic Engineer Podcast:

Perhaps software engineering will change, like writing changed hundreds of years ago thanks to printing

Full: www.youtube.com/watch?v=98o_...

February 7, 2025 at 4:27 PM

Tom Hipwell

@tomhipwell.co

Really enjoyed Dario's blog post, I thought it had a bunch of interesting, verifiable predictions (I'm less interested in the export control stuff). It'll be interesting to see if they come good over the course of '25. Worth a read, cuts through the noise -> darioamodei.com/on-deepseek-...

Dario Amodei — On DeepSeek and Export Controls

On DeepSeek and Export Controls

darioamodei.com

January 29, 2025 at 9:43 PM

Tom Hipwell

@tomhipwell.co

Great read, useful survey of the field.

Chris Paxton @cpaxton.bsky.social · Jan 2

I probably don’t need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, it’s been amazing to watch; some of the things that I always dreamed about actually seem to be happening.

For me, there are three big stories: itcanthink.substack.com/p/2024-robot...

2024 Robotics Year in Review

Robotics finally feels like it's happening

itcanthink.substack.com

January 2, 2025 at 8:55 PM

Tom Hipwell

@tomhipwell.co

Good list, would be good to hear additions from #dataBS, I would add "what are embeddings" by @vickiboykis.com -> www.latent.space/p/2025-papers

The 2025 AI Engineering Reading List

We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.

www.latent.space

January 2, 2025 at 7:42 PM

Tom Hipwell

@tomhipwell.co

I like @simonwillison.net definition of slop, but we're missing a word to describe an algorithm that pushes low grade content. I'd propose gruel, e.g my LinkedIn feed is all gruel now. Netflix recommends gruel all the time. Spotify's playlists are stuffed full of gruel. You get the picture.

December 31, 2024 at 8:18 PM

Tom Hipwell

@tomhipwell.co

If you can solve this puzzle, then do not despair! You are still much smarter than o3

December 24, 2024 at 9:14 AM

Tom Hipwell

@tomhipwell.co

Great post, a starting point for something that has been confusing for a while

Nathan Lambert @natolambert.bsky.social · Dec 18

The AI agent spectrum
Separating different classes of AI agents from a long history of reinforcement learning.
Why we can be optimistic for AI agents but also extremely critical of the terrible communications around them to date.
Plus, some policy guidance.

The AI Agent Spectrum

Separating different classes of AI agents from a long history of reinforcement learning.

buff.ly

December 18, 2024 at 5:18 PM

Tom Hipwell

@tomhipwell.co

webdev arena is a _really_ fast way to get a feel for the coding abilities of the different models out there. Worth five minutes of your time to design a tricky prompt, then quickly assess each model generation as they get released -> web.lmarena.ai

WebDev Arena

WebDev Arena: AI Battle to build the best website

web.lmarena.ai

December 17, 2024 at 12:27 PM

Tom Hipwell

@tomhipwell.co

Interesting ideas

Will Whitney @wfwhitney.bsky.social · Dec 14

The future of AI is models that generate graphical interfaces. Instead of the linear, low-bandwidth metaphor of conversation, models will represent themselves to us as computers: rich visuals, direct manipulation, and instant feedback.

willwhitney.com/computing-in...

Computing inside an AI | Will Whitney

willwhitney.com

December 14, 2024 at 9:42 AM

Tom Hipwell

@tomhipwell.co

I've had this post sat in my drafts for well over 6 months now, but with the release of Sora to GA yesterday I thought I'd share it - Sora: An Idiot's Guide.

tomhipwell.co/blog/sora/

Sora: An idiot's guide | Tom Hipwell

Just what is latent space anyway?

tomhipwell.co

December 10, 2024 at 12:50 PM

Reposted by Tom Hipwell

Sung Kim

@sungkim.bsky.social

Reinforcement Learning: An Overview

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based methods, and various other topics.

arxiv.org/abs/2412.05265

December 9, 2024 at 8:37 AM

Tom Hipwell

@tomhipwell.co

I had a play about with genini-1206 last night, trying some prompts into both o1 and 1206 in parallel, both models are crackers. Vibes for me are better with o1, feels a bit more succinct and gets to the point quicker. Early days of course.

Hoi Lam @hoitab.bsky.social · Dec 6

Wow. Can't believe it's one year after our first Gemini model release. Today, we are "crushing it" according to LMSYS leaderboard.🥳

The latest 1206 release is #1 in ALL categories. You can try it here: aistudio.google.com/app/prompts/...

December 8, 2024 at 8:52 AM

Reposted by Tom Hipwell

Sung Kim

@sungkim.bsky.social

Alibaba Qwen team just released the base models for Qwen2-VL. You can also wait for them to release Qwen2.5-VL, which should be sooner than later.

2B: huggingface.co/Qwen/Qwen2-V...
7B: huggingface.co/Qwen/Qwen2-V...
72B: huggingface.co/Qwen/Qwen2-V...

Qwen/Qwen2-VL-2B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

December 7, 2024 at 12:06 AM

Reposted by Tom Hipwell

Ethan Mollick

@emollick.bsky.social

A test of how seriously your firm is taking AI: when o-1 (& the new Gemini model) came out this week, were there assigned folks who immediately ran the model through your internal, validated, firm-specific benchmarks to see how useful it as? Did you update any plans or goals as a result?

December 7, 2024 at 4:34 PM

Tom Hipwell

@tomhipwell.co

Some great ideas here, worth your time to explore

Amelia Wattenberger @wattenberger.com · Dec 3

🐟 some musings on how we might use LLMs
🐠 to interact with text at multiple levels of abstraction
🐡 inspired by the fish-eye lens

December 6, 2024 at 9:29 PM

Reposted by Tom Hipwell

Chris

@chris.blue

First new post in a couple of weeks! There's been a lot of activity around regattastorage.com this week, so I decided to write about the space. tl;dr It's pretty exciting!

The Quest for a Distributed POSIX-Compatible Filesystem

Distributed POSIX filesystems have proven elusive, but we're getting closer. Perhaps that's all we need.

materializedview.io

December 5, 2024 at 8:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news