Toby Murray
@tobycmurray.bsky.social
Professor at University of Melbourne and School of Computing and Information Systems cyber lead; Director @dsi-vic.bsky.social; Oxford DPhil (@compscioxford.bsky.social; @hertfordcollege.bsky.social). Cyber, verification, etc. He/him
Makes sense. Like @clegoues.bsky.social looking forward to reading the paper
November 9, 2025 at 10:43 AM
Makes sense. Like @clegoues.bsky.social looking forward to reading the paper
My guess is this: let P be a problem you wish an LLM to generate a program to solve. Semantically P is a set of pairs (x,y) of inputs x and corresponding outputs y. The assumption is that each x maps to at most one y, i.e. P is a function. Assuming extensionality, P is then also unique.
November 9, 2025 at 1:25 AM
My guess is this: let P be a problem you wish an LLM to generate a program to solve. Semantically P is a set of pairs (x,y) of inputs x and corresponding outputs y. The assumption is that each x maps to at most one y, i.e. P is a function. Assuming extensionality, P is then also unique.
Ah that makes sense. Very cool stuff
November 8, 2025 at 11:10 PM
Ah that makes sense. Very cool stuff
What about if you ask for a program to recognise MNIST digits? We don’t even know what correctness means for such a program. So how can we hope to bound it by incoherence? 2/2
November 8, 2025 at 8:26 AM
What about if you ask for a program to recognise MNIST digits? We don’t even know what correctness means for such a program. So how can we hope to bound it by incoherence? 2/2
There seem to be lots of assumptions here. Suppose you ask for a program that performs a search. When run on lists with duplicates two implementations may disagree while both may be correct. 1/2
November 8, 2025 at 8:26 AM
There seem to be lots of assumptions here. Suppose you ask for a program that performs a search. When run on lists with duplicates two implementations may disagree while both may be correct. 1/2
I worry instead about authors publishing papers whose methods and results they don't understand---even if they do replicate---because the key ideas, experimental infrastructure and data analyses came from an LLM, as did their explanations.
Are we headed to peer view via viva voce interrogation? 2/2
Are we headed to peer view via viva voce interrogation? 2/2
November 2, 2025 at 10:12 PM
I worry instead about authors publishing papers whose methods and results they don't understand---even if they do replicate---because the key ideas, experimental infrastructure and data analyses came from an LLM, as did their explanations.
Are we headed to peer view via viva voce interrogation? 2/2
Are we headed to peer view via viva voce interrogation? 2/2
Happy birthday! Imagine what we could all achieve if we lived to 10,000,000. Though as a species we’re still 6 years behind that record en.wikipedia.org/wiki/List_of...
List of the verified oldest people - Wikipedia
en.wikipedia.org
October 28, 2025 at 9:13 AM
Happy birthday! Imagine what we could all achieve if we lived to 10,000,000. Though as a species we’re still 6 years behind that record en.wikipedia.org/wiki/List_of...
Ah got it. Thanks
Of course there are programming languages that do not provide sufficient isolation for that kind of compositional reasoning to be sound. A module can be secure in isolation but insecure when composed with others.
JavaScript is a perfect example.
Of course there are programming languages that do not provide sufficient isolation for that kind of compositional reasoning to be sound. A module can be secure in isolation but insecure when composed with others.
JavaScript is a perfect example.
October 24, 2025 at 9:01 PM
Ah got it. Thanks
Of course there are programming languages that do not provide sufficient isolation for that kind of compositional reasoning to be sound. A module can be secure in isolation but insecure when composed with others.
JavaScript is a perfect example.
Of course there are programming languages that do not provide sufficient isolation for that kind of compositional reasoning to be sound. A module can be secure in isolation but insecure when composed with others.
JavaScript is a perfect example.
Not sure I’m following. It sounds like you’re saying “model checking scales better than we think because it also lacks expressiveness”.
I think you’re right to be sceptical about whether model checking really is as precise as something like interactive proof. But your argument was about stability
I think you’re right to be sceptical about whether model checking really is as precise as something like interactive proof. But your argument was about stability
October 23, 2025 at 12:38 PM
Not sure I’m following. It sounds like you’re saying “model checking scales better than we think because it also lacks expressiveness”.
I think you’re right to be sceptical about whether model checking really is as precise as something like interactive proof. But your argument was about stability
I think you’re right to be sceptical about whether model checking really is as precise as something like interactive proof. But your argument was about stability
Can you use model checking to prove seL4’s kernel invariant?
Some software is pathologically non modular. Micro kernels are such. If any part of the kernel could’ve been modularised it won’t be on the kernel.
Hence kernel invariants are a tangle of mutual dependencies.
Some software is pathologically non modular. Micro kernels are such. If any part of the kernel could’ve been modularised it won’t be on the kernel.
Hence kernel invariants are a tangle of mutual dependencies.
October 23, 2025 at 5:08 AM
Can you use model checking to prove seL4’s kernel invariant?
Some software is pathologically non modular. Micro kernels are such. If any part of the kernel could’ve been modularised it won’t be on the kernel.
Hence kernel invariants are a tangle of mutual dependencies.
Some software is pathologically non modular. Micro kernels are such. If any part of the kernel could’ve been modularised it won’t be on the kernel.
Hence kernel invariants are a tangle of mutual dependencies.
cc a few folks who might find this of interest: @m-dodds.bsky.social @shriram.bsky.social @lawrpaulson.bsky.social @microkerneldude.bsky.social @hillelwayne.com @nikolajbjorner.bsky.social
And a repeat of the graphic and blog link that were at the top of this thread: verse.systems/blog/post/20...
And a repeat of the graphic and blog link that were at the top of this thread: verse.systems/blog/post/20...
October 22, 2025 at 11:54 AM
cc a few folks who might find this of interest: @m-dodds.bsky.social @shriram.bsky.social @lawrpaulson.bsky.social @microkerneldude.bsky.social @hillelwayne.com @nikolajbjorner.bsky.social
And a repeat of the graphic and blog link that were at the top of this thread: verse.systems/blog/post/20...
And a repeat of the graphic and blog link that were at the top of this thread: verse.systems/blog/post/20...
I find the triangular device especially useful when giving talks about formal methods to outsiders, as a way of providing a map to understanding the landscape of formal verification methods and their various trade-offs. Please feel free to use it in your own talks and writings.
October 22, 2025 at 11:43 AM
I find the triangular device especially useful when giving talks about formal methods to outsiders, as a way of providing a map to understanding the landscape of formal verification methods and their various trade-offs. Please feel free to use it in your own talks and writings.
This "we have three desirable properties; pick any two" is a familiar situation. The CAP theorem is a good example. Zooko's Triangle is another. Economists have several of them. The English word "trilemma", which dates back centuries, was coined by British religious preachers for this concept.
October 22, 2025 at 11:43 AM
This "we have three desirable properties; pick any two" is a familiar situation. The CAP theorem is a good example. Zooko's Triangle is another. Economists have several of them. The English word "trilemma", which dates back centuries, was coined by British religious preachers for this concept.
Consider symbolic execution. It is automatic and precise. But path explosion means it doesn't scale so well. Abstract interpretation is also automatic but---because it reasons about state abstractions---foregoes the precision of symex. Interactive proof scales and is precise, but is not automatic.
October 22, 2025 at 11:43 AM
Consider symbolic execution. It is automatic and precise. But path explosion means it doesn't scale so well. Abstract interpretation is also automatic but---because it reasons about state abstractions---foregoes the precision of symex. Interactive proof scales and is precise, but is not automatic.
I like to do that too and, like you, tell myself this is good surfing practice. Each time I try to go surfing with my kids, however, I’m reminded that, sadly, skill in public-transport surfing does not translate to skill in wave surfing :)
October 16, 2025 at 1:48 PM
I like to do that too and, like you, tell myself this is good surfing practice. Each time I try to go surfing with my kids, however, I’m reminded that, sadly, skill in public-transport surfing does not translate to skill in wave surfing :)
Err s/women not/would now/
October 16, 2025 at 12:59 PM
Err s/women not/would now/
I’ve also found LLMs usage to form a crutch that prevents learning. The number of hours I’ve spent with ChatGPT coding little NumPy/TensorFlow apps without ever managing to learn much about either framework, I could’ve have learned both frameworks properly and women not be more productive if I had
October 16, 2025 at 7:31 AM
I’ve also found LLMs usage to form a crutch that prevents learning. The number of hours I’ve spent with ChatGPT coding little NumPy/TensorFlow apps without ever managing to learn much about either framework, I could’ve have learned both frameworks properly and women not be more productive if I had