Robin Linacre
robinlinacre.bsky.social
Robin Linacre
@robinlinacre.bsky.social
Lead developer of Splink. Data scientist at Ministry of Justice. Trustee, GiveDirectly UK. Pledgee, http://givingwhatwecan.org. All views my own.
Reposted by Robin Linacre
OpenUK Awards 25 Open Data Category sponsored by Open Data Institute, Shortlist is live, congratulations to the shortlisted nominees: Ministry of Justice UK Splink Team (@robinlinacre.bsky.social), OpenActive, and UK Power Networks (Yiu-Shing Pang) 🍾🥂🏆

#openukawards #opensource #opendata
November 4, 2025 at 11:30 AM
Reposted by Robin Linacre
More progress on #openaddresses:

Islington Council in London has released its Council Tax address list for re-use as #opendata under the Open Government Licence www.owenboswarva.com/blog/post-ad...

I've made a geocoded version by adding coordinates from ONS

#FOI #localgov #UKhousing #proptech
October 28, 2025 at 8:46 AM
New ✨interactive✨ explainer: Address matching using a fault tolerant trie:

robinlinacre.com/fault_tolera...

Which illustrates a powerful technique for address matching that we're currently working on building into uk_address_matcher (github.com/moj-analytic...)
September 24, 2025 at 7:51 AM
When working a complex postgres schema, I find it time consuming to figure out the joins.

I had an idea: a 'join generator' that traverses the relationship graph for you, and writes the joins.

You give it a dump of the postgres schema, and it gives you a UI.

www.robinlinacre.com/vite_live_pg...
August 18, 2025 at 6:40 AM
We're working on a DuckDB community extension called `splink_udfs` to add some record linkage related functions to DuckDB. It's currently very much WIP, but you can already use it wherever you're using DuckDB.
github.com/moj-analytic...
July 22, 2025 at 4:50 PM
If you're using Splink with DuckDB you should see significant speed improvements by updating to DuckDB 1.3.x. You can also add more granularity to your comparison levels statements without an impact on run times. Depending on your model spec, it could be twice as fast or better.
Speed enhancement: 'Pushing up' common elements of CASE statements into reused computations by RobinL · Pull Request #2738 · moj-analytical-services/splink
This is a clean rewrite of #2630. The rationale is explained further in #2580, but in a nutshell it eliminates repeated computations of potentially expensive functions in some backends, e.g CASE ...
github.com
July 15, 2025 at 4:46 PM
My most commonly used pattern for AI coding: Dump entire source code into Gemini 2.5 pro, write prompt specifying what I want, and then: Give precise instructions for an LLM to follow to implement this feature. Break the solution down into steps where each step is verifiable.
July 11, 2025 at 8:33 AM
I have been working on a free, high performance address matcher. I've written up some key tricks, techniques, and ideas into a blog post here: www.robinlinacre.com/address_matc...
Building Accurate Address Matching Systems
A bag of tricks to improve the accuracy of geocoding
www.robinlinacre.com
July 5, 2025 at 10:00 AM
The 'build' button in google AI studio is unbelievably good. I had an idea to visualise fractions, three prompts total and it's pretty close to something useful (it does this for any arbitrary fractions). Even the one-shot attempt was pretty good
May 22, 2025 at 8:15 PM
My PyData Global talk "Rapid deduplication and fuzzy matching of large datasets using Splink" is now on Youtube: www.youtube.com/watch?v=eQtF...
Robin Linacre - Rapid deduplication and fuzzy matching of large datasets using Splink
YouTube video by PyData
www.youtube.com
April 17, 2025 at 3:03 PM
I see too much focus on trying to find applications of LLMs to help other people 'at scale' with their jobs. At the moment, the output of LLMs is rarely useful for business rules or passive consumption. The lower hanging fruit is helping people use AI directly & however they see fit in their job.
April 4, 2025 at 8:38 AM
April 2, 2025 at 1:29 PM
If you're using duckdb in a python script or jupyter notebook, you can run con.execute('CALL start_ui()') at any point, and the ui will pop right up in your web browser with the current database automatically available.

(I knew about the UI, but I had missed this trick!)
April 1, 2025 at 6:28 AM
Gemini 2.5 pro is really good. Grok 3 felt like a big step forwards and was my 'go-to' for hard problems, and this feels like another significant step forward.

So nice with small codebases to be able put everything into context (I use github.com/simonw/files... )
GitHub - simonw/files-to-prompt: Concatenate a directory full of files into a single prompt for use with LLMs
Concatenate a directory full of files into a single prompt for use with LLMs - simonw/files-to-prompt
github.com
March 29, 2025 at 11:39 AM
Reposted by Robin Linacre
Ended up writing a follow up post with the final approach and learnings from getting this running on GitHub Actions!

All original datasets weight more than 500GB combined. The final ones published on 🤗, only 1 GB. Took some tinkering to get there but was fun!

davidgasquez.com/exporting-in...
March 20, 2025 at 12:56 PM
New blog: Why DuckDB is my first choice for data processing:
www.robinlinacre.com/recommend_du...
Why DuckDB is my first choice for data processing
Why DuckDB has become my go-to tool for data processing, offering simplicity, speed, and powerful features.
www.robinlinacre.com
March 16, 2025 at 7:17 PM
I vibe coded a primary school maths breakout game - aimed to be fun and educational.
rupertlinacre.com/breakout_mat...
In the process I created and open sourced a maths problem generator aligned to the national curriculum, so you can vibe code your own maths games!
www.npmjs.com/package/math...
Breakout Maths Game
rupertlinacre.com
March 15, 2025 at 7:40 PM
Just added an example/tutorial to the Splink docs of matching business data.

It uses some feature engineering tricks that help improve accuracy vs. just fuzzy matching on names.

moj-analytical-services.github.io/splink/demos...
Linking businesses - Splink
moj-analytical-services.github.io
February 14, 2025 at 2:01 PM
I think the single most productivity-enhancing use of LLMs in gov would be give all devs and data scientists access to Cursor (or equivalent). I am not yet convinced of the widespread value of 'behind the scenes' uses of LLMs, but v. bullish on skilled human-in-the-loop uses, especially coding
February 13, 2025 at 1:16 PM
With DuckDB WASM it's possible to run a full Splink model in your browser in a single, standalone .html page.

Here's an example:
www.robinlinacre.com/live_splink/

And the git repo:
github.com/RobinL/vite_...
February 3, 2025 at 4:47 PM
Playing around with a spatial duckdb wasm database in a static webpage. Absolutely amazing how far you can get with geospatial in the browser using entirely open source tools
January 26, 2025 at 3:57 PM
Writing code with LLMs:
1. Iteratively try to explain what you want using natural language until the code works
2. You now have a precise prompt (the working code). Use it to get high quality code.
January 16, 2025 at 9:16 AM
Splink has just passed 10 million downloads on PyPi, the first UK gov package to do so!

If you're a user, we'd be super grateful if you could support Splink by adding your org to our list of users: moj-analytical-services.github.io/splink/index.... To do so, please get in touch, or raise a PR.
Splink
moj-analytical-services.github.io
January 13, 2025 at 9:03 AM
New blog: AI probably won't replace me in 2025
www.robinlinacre.com/llms_in_2025/
AI probably won't replace me in 2025
My mental model of LLMs, their strengths and shortcomings
www.robinlinacre.com
January 1, 2025 at 5:11 PM
Advanced Voice Mode + video as a reading companion is incredible as an educational tool. Completely transformational for teaching/learning something like Shakespeare. It can read aloud, translate into simpler/modern English, explain what's going on etc.
December 14, 2024 at 9:37 AM