Lightnews — Scholar-powered news

gavin.codes

@gavin.codes

In the very near future code will be organised more around contract than implementation, just as currently we organise around high-level constructs and not machine code.

December 11, 2025 at 7:34 AM

gavin.codes

@gavin.codes

Generative AI is finally making software verification not just practical, but it a requirement (pun intended). AI slop is a serious problem, but it is also an opportunity.

December 11, 2025 at 7:34 AM

gavin.codes

@gavin.codes

7/

And lo and behold, during my exploration of the space of tools to assist in this I found that Bertrand Meyer the author of DbC has been thinking along very similar lines and has made the case in a pre-print! se.inf.ethz.ch/~meyer/publi...

December 9, 2025 at 8:34 AM

gavin.codes

@gavin.codes

6/

Step A improves discoverability of code for re-use by the LLM. Step D reduces the burden on programmers to ensure correctness. Both address serious problems with current vibe-coding practices.

December 9, 2025 at 8:34 AM

gavin.codes

@gavin.codes

5/

D) Use the contracts for static and dynamic verification. The boolean specifications can be used to do hypothesis testing we can derive test data (and therefore tests) automatically. SMT solvers can be used to check pre-post conditions.

December 9, 2025 at 8:34 AM

gavin.codes

@gavin.codes

4/

C) Have an LLM write the implementation. With a combination of editor mode fencing which stops the LLM from editing the specification we can drive an LLM with a very explicit definition of correctness helping it to get the correct answer.

December 9, 2025 at 8:34 AM

gavin.codes

@gavin.codes

3/

B) Write predicates which check the conditions of the contract (pre/post-conditions and invariants). In my experiments I have used python itself to define boolean functions which act as predicates.

December 9, 2025 at 8:34 AM

gavin.codes

@gavin.codes

2/

A) Write contracts in conversation with an LLM to improve requirements gathering and specification completeness. LLMs are quite good at getting people to clarify written text and can accelerate things substantially.

December 9, 2025 at 8:34 AM

gavin.codes

@gavin.codes

I'm using aider extensively. I tried claude code and cursor, but I like the fine control I have in aider.

I think we have a long way to go though since programming itself is going to have to change style in response.

December 4, 2025 at 3:12 PM

gavin.codes

@gavin.codes

6/ Ultimately however, types are too weak (at least in a side-effecting language with simple types - I'm not addressing dependent type theory). My thesis is that we need a return to contract based programming (en.wikipedia.org/wiki/Design_...) to assist the interface between humans and the AIs.

Design by contract - Wikipedia

en.wikipedia.org

December 4, 2025 at 12:45 PM

gavin.codes

@gavin.codes

6/ We've a chicken-and-egg type situation at the minute. Probably python will continue to be optimal in that you can add type annotations while not requiring type discipline while also getting some advantages of static analysis.

December 4, 2025 at 12:45 PM

gavin.codes

@gavin.codes

5/ Types are also of interest. While types make it harder for AIs to one shot (they require more loops) they tend to get better code and the types themselves tend to constrain the output to more *correct* code.

I'm having trouble finding proof of this in the literature.

December 4, 2025 at 12:45 PM

gavin.codes

@gavin.codes

4/ This suggests that the semantics of javascript are inherently more complex for LLMs to get a handle on since the volumes must be very high. It could well be that typescript just doesn't have enough data to fix the problem.

December 4, 2025 at 12:45 PM

gavin.codes

@gavin.codes

3/ Javascript is doing far worse than one might expect due to volume. C# and Java and even Go are better. Typescript is not fixing the problem.

December 4, 2025 at 12:45 PM

gavin.codes

@gavin.codes

2/ Volume is key to training AIs, and so those programming languages with large training volumes tend to win. Python is killing it. However there are some secrets hidden in this chart.

December 4, 2025 at 12:45 PM

gavin.codes

@gavin.codes

9/ And best of all the entire methodology through step 7 can be treated as a kind of increasing level of trust in the software without us having to pay human attention much beyond the logical contract.

December 1, 2025 at 4:01 PM

gavin.codes

@gavin.codes

8/ The great thing about this methodology is that we can get to step 5 with very little effort. Step 6 has form in the Nagini project mentioned above, although we would want to modify the approach a bit. And step 7 is technically possible, albeit with some heavy lifting.

December 1, 2025 at 4:01 PM

gavin.codes

@gavin.codes

7/

7. **Extra Credit**: Send the remaining proof obligations to a theorem prover to complete with an LLM loop (see: DeepSeek-Prover-V2 arxiv.org/abs/2504.21801). The truly ambitious could attempt to help the prover discharge thorny problems by inventing lemmas.

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powe...

arxiv.org

December 1, 2025 at 4:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news