Toby Murray
banner
tobycmurray.bsky.social
Toby Murray
@tobycmurray.bsky.social
Professor at University of Melbourne and School of Computing and Information Systems cyber lead; Director @dsi-vic.bsky.social; Oxford DPhil (@compscioxford.bsky.social; @hertfordcollege.bsky.social). Cyber, verification, etc. He/him
Makes sense. Like @clegoues.bsky.social looking forward to reading the paper
November 9, 2025 at 10:43 AM
My guess is this: let P be a problem you wish an LLM to generate a program to solve. Semantically P is a set of pairs (x,y) of inputs x and corresponding outputs y. The assumption is that each x maps to at most one y, i.e. P is a function. Assuming extensionality, P is then also unique.
November 9, 2025 at 1:25 AM
Ah that makes sense. Very cool stuff
November 8, 2025 at 11:10 PM
What about if you ask for a program to recognise MNIST digits? We don’t even know what correctness means for such a program. So how can we hope to bound it by incoherence? 2/2
November 8, 2025 at 8:26 AM
There seem to be lots of assumptions here. Suppose you ask for a program that performs a search. When run on lists with duplicates two implementations may disagree while both may be correct. 1/2
November 8, 2025 at 8:26 AM
I worry instead about authors publishing papers whose methods and results they don't understand---even if they do replicate---because the key ideas, experimental infrastructure and data analyses came from an LLM, as did their explanations.

Are we headed to peer view via viva voce interrogation? 2/2
November 2, 2025 at 10:12 PM
Happy birthday! Imagine what we could all achieve if we lived to 10,000,000. Though as a species we’re still 6 years behind that record en.wikipedia.org/wiki/List_of...
List of the verified oldest people - Wikipedia
en.wikipedia.org
October 28, 2025 at 9:13 AM
Ah got it. Thanks

Of course there are programming languages that do not provide sufficient isolation for that kind of compositional reasoning to be sound. A module can be secure in isolation but insecure when composed with others.

JavaScript is a perfect example.
October 24, 2025 at 9:01 PM
Not sure I’m following. It sounds like you’re saying “model checking scales better than we think because it also lacks expressiveness”.

I think you’re right to be sceptical about whether model checking really is as precise as something like interactive proof. But your argument was about stability
October 23, 2025 at 12:38 PM
Can you use model checking to prove seL4’s kernel invariant?

Some software is pathologically non modular. Micro kernels are such. If any part of the kernel could’ve been modularised it won’t be on the kernel.

Hence kernel invariants are a tangle of mutual dependencies.
October 23, 2025 at 5:08 AM
cc a few folks who might find this of interest: @m-dodds.bsky.social @shriram.bsky.social @lawrpaulson.bsky.social @microkerneldude.bsky.social @hillelwayne.com @nikolajbjorner.bsky.social

And a repeat of the graphic and blog link that were at the top of this thread: verse.systems/blog/post/20...
October 22, 2025 at 11:54 AM
I find the triangular device especially useful when giving talks about formal methods to outsiders, as a way of providing a map to understanding the landscape of formal verification methods and their various trade-offs. Please feel free to use it in your own talks and writings.
October 22, 2025 at 11:43 AM
This "we have three desirable properties; pick any two" is a familiar situation. The CAP theorem is a good example. Zooko's Triangle is another. Economists have several of them. The English word "trilemma", which dates back centuries, was coined by British religious preachers for this concept.
October 22, 2025 at 11:43 AM
Consider symbolic execution. It is automatic and precise. But path explosion means it doesn't scale so well. Abstract interpretation is also automatic but---because it reasons about state abstractions---foregoes the precision of symex. Interactive proof scales and is precise, but is not automatic.
October 22, 2025 at 11:43 AM
I like to do that too and, like you, tell myself this is good surfing practice. Each time I try to go surfing with my kids, however, I’m reminded that, sadly, skill in public-transport surfing does not translate to skill in wave surfing :)
October 16, 2025 at 1:48 PM
Err s/women not/would now/
October 16, 2025 at 12:59 PM
I’ve also found LLMs usage to form a crutch that prevents learning. The number of hours I’ve spent with ChatGPT coding little NumPy/TensorFlow apps without ever managing to learn much about either framework, I could’ve have learned both frameworks properly and women not be more productive if I had
October 16, 2025 at 7:31 AM