I don't think formal verification really addresses most day-to-day programming problems:
* A user interface is confusing, or the English around it is unclear
* An API you rely on changes, is deprecated, etc.
* Users use something in unexpected ways
* Updates forced by vendors or open source projects cause things to break
* The customer isn't clear what they want
* Complex behavior between interconnected systems, out of the purview of the formal language (OS + database + network + developer + VM + browser + user + web server)
For some mathematically pure task, sure, it's great. Or a low-level library like a regular expression parser or a compression codec. But I don't think that represents a lot of what most of us are tasked with, and those low-level "mathematically pure" libraries are generally pretty well handled by now.
In fact, automated regression tests done by ai with visual capabilities may have bigger impact than formal verification has. You can have an army of testers now, painfully going through every corner of your software
Will only work somewhat when customers expect features to work in a standard way. When customer spec things to work in non-standard approaches you'll just end up with a bunch of false positives.
This. When the bugs come streaming in you better have some other AI ready to triage them and more AI to work them, because no human will be able to keep up with it all.
Bug reporting is already about signal vs noise. Imagine how it will be when we hand the megaphone to bots.
> Complex behavior between interconnected systems, out of the purview of the formal language (OS + database + network + developer + VM + browser + user + web server)
Not really, some components like components have a lot of properties that’s very difficult to modelize. Take latency in network, or storage performance in OS.
TBH most day to day programming problems are barely worth having tests for. But if we had formal specs and even just hand wavy correspondences between the specs and the implementation for the low level things everybody depends on that would be a huge improvement for the reliability of the whole ecosystem.
A limited form of formal verification is already mainstream. It is called type systems. The industry in general has been slowly moving to encode more invariants into the type system, because every invariant that is in the type system is something you can stop thinking about until the type checker yells at you.
A lot of libraries document invariants that are either not checked at all, only at runtime, or somewhere in between. For instance, the requirement that a collection not be modified during interaction. Or that two region of memory do not overlap, or that a variable is not modified without owning a lock. These are all things that, in principle, can be formally verified.
No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.
For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.
> No one claims that good type systems prevent buggy software.
That's exactly what languages with advanced type systems claim. To be more precise, they claim to eliminate entire classes of bugs. So they reduce bugs, they don't eliminate them completely.
If you eliminate the odd integers from consideration, you've eliminated an entire class of integers. yet, the set of remaining integers is of the same size as the original.
Programs are not limited; the number of Turing machines is countably infinite.
When you say things like "eliminate a class of bugs", that is played out in the abstraction: an infinite subset of that infinity of machines is eliminated, leaving an infinity.
How you then sample from that infinity in order to have something which fits on your actual machine is a separate question.
Because the number of state where a program can be is so huge (when you consider everything that can influence how a program runs and the context where and when it runs) it is for the current computation power practically infinite but yes it is theoretically finite and can even be calculated.
> For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.
The problem is there is always some chance a coding agent will get stuck and be unable to produce a conforming implementation in a reasonable amount of time. And then you are back in a similar place to what you were with those pre-LLM solutions - needing a human expert to work out how to make further progress.
With the added issue that now the expert is working with code they didn't write, and that could be in general be harder to understand than human-written code. So they could find it easier to just throw it away and start from scratch.
I would argue that the barrier to entry is on par with python for a person with no experience, but you need much more time with Haskell to become proficient in it. In python, on the other hand, you can learn the basics and these will get you pretty far
IMHO, these strong type systems are just not worth it for most tasks.
As an example, I currently mostly write GUI applications for mobile and desktop as a solo dev. 90% of my time is spent on figuring out API calls and arranging layouts. Most of the data I deal with are strings with their own validation and formatting rules that are complicated and at the same time usually need to be permissive. Even at the backend all the data is in the end converted to strings and integers when it is put into a database. Over-the-wire serialization also discards with most typing (although I prefer protocol buffers to alleviate this problem a bit).
Strong typing can be used in between those steps but the added complexity from data conversions introduces additional sources of error, so in the end the advantages are mostly nullified.
> Most of the data I deal with are strings with their own validation and formatting rules that are complicated and at the same time usually need to be permissive
this is exactly where a good type system helps: you have an unvalidated string and a validated string which you make incompatible at the type level, thus eliminating a whole class of possible mistakes. same with object ids, etc.
> For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.
That is not novel and every declarative language precisely embodies it.
I think most existing declarative languages still require the programmer to specify too many details to get something usable. For instance, Prolog often requires the use of 'cut' to get reasonable performance for some problems.
> No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.
To me it seems they reduce productivity. In fact, for Rust, which seems to match the examples you gave about locks or regions of memory the common wisdom is that it takes longer to start a project, but one reaps the benefits later thanks to more confidence when refactoring or adding code.
However, even that weaker claim hasn’t been proven.
In my experience, the more information is encoded in the type system, the more effort is required to change code. My initial enthusiasm for the idea of Ada and Spark evaporated when I saw how much ceremony the code required.
> In my experience, the more information is encoded in the type system, the more effort is required to change code.
I would tend to disagree. All that information encoded in the type system makes explicit what is needed in any case and is otherwise only carried informally in peoples' heads by convention. Maybe in some poorly updated doc or code comment where nobody finds it. Making it explicit and compiler-enforced is a good thing. It might feel like a burden at first, but you're otherwise just closing your eyes and ignoring what can end up important. Changed assumptions are immediately visible. Formal verification just pushes the boundary of that.
> All that information encoded in the type system makes explicit what is needed in any case and is otherwise only carried informally in peoples' heads by convention
this is, in fact better for llms, they are better at carrying information and convention in their kv cache than they are in having to figure out the actual types by jumping between files and burning tokens in context/risking losing it on compaction (or getting it wrong and having to do a compilation cycle).
if a typed language lets a developer fearlessly build a semantically inconsistent or confusing private API, then llms will perform poorer at them even though correctness is more guaranteed.
In practice it would be encoded in comments, automated tests and docs, with varying levels of success.
It’s actually similar to tests in a way: they provide additional confidence in the code, but at the same time ossify it and make some changes potentially more difficult.
Interestingly, they also make some changes easier, as long as not too many types/tests have to be adapted.
This reads to me like an argument for better refactoring tools, not necessarily for looser type systems. Those tools could range from mass editing tools, IDEs changing signatures in definitions when changing the callers and vice versa, to compiler modes where the language rules are relaxed.
Capturing invariants in the type system is a two-edged sword.
At one end of the spectrum, the weakest type systems limit the ability of an IDE to do basic maintenance tasks (e.g. refactoring).
At the other end of the spectrum, dependent type and especially sigma types capture arbitrary properties that can be expressed in the logic. But then constructing values in such types requires providing proofs of these properties, and the code and proofs are inextricably mixed in an unmaintainable mess. This does not scale well: you cannot easily add a new proof on top of existing self-sufficient code without temporarily breaking it.
Like other engineering domains, proof engineering has tradeoffs that require expertise to navigate.
> but one reaps the benefits later thanks to more confidence when refactoring or adding code.
To be honest, I believe it makes refactoring/maintenance take longer. Sure, safer, but this is not a one-time only price.
E.g. you decide to optimize this part of the code and only return a reference or change the lifetime - this is an API-breaking change and you have to potentially recursively fix it. Meanwhile GC languages can mostly get away with a local-only change.
Don't get me wrong, in many cases this is more than worthwhile, but I would probably not choose rust for the n+1th backend crud app for this and similar reasons.
The choice of whether to use GC is completely orthogonal to that of a type system. On the contrary, being pointed at all the places that need to be recursively fixed during a refactoring is a huge saving in time and effort.
I was talking about a type system with affine types, as per the topic was Rust specifically.
I compared it to a statically typed language with a GC - where the runtime takes care of a property that Rust has to do statically, requiring more complexity.
In my opinion, programming languages with a loose type system or no explicit type system only appear to foster productivity, because it is way easier to end up with undetected mistakes that can bite later, sometimes much later. Maybe some people argue that then it is someone else's problem, but even in that case we can agree that the overall quality suffers.
"In my experience, the more information is encoded in the type system, the more effort is required to change code."
Have you seen large js codebases? Good luck changing anything in it, unless they are really, really well written, which is very rare. (My own js code is often a mess)
When you can change types on the fly somewhere hidden in code ... then this leads to the opposite of clarity for me. And so lots of effort required to change something in a proper way, that does not lead to more mess.
a) It’s fast to change the code, but now I have failures in some apparently unrelated part of the code base. (Javascript) and fixing that slows me down.
b) It’s slow to change the code because I have to re-encode all the relationships and semantic content in the type system (Rust), but once that’s done it will likely function as expected.
Depending on project, one or the other is preferable.
Or: I’m not going to do this refactor at all, even though it would improve the codebase, because it will be near impossible to ensure everything is correct after making so many changes.
To me, this has been one of the biggest advantages of both tests and types. They provide confidence to make changes without needing to be scared of unintended breakages.
There's a tradeoff point somewhere where it makes sense to go with one or another. You can write a lot of codes in bash and Elisp without having to care about the type of whatever you're manipulating. Because you're handling one type and encoding the actual values in a typesytem would be very cumbersome. But then there are other domain which are fairly known, so the investment in encoding it in a type system does pay off.
Soon a lot of people will go out of the way and try to convince you that Rust is most productive language, functions having longer signatures than their bodies is actually a virtue, and putting .clone(), Rc<> or Arc<> everywhere to avoid borrow-checker complaints makes Rust easier and faster to write than languages that doesn't force you to do so.
Of course it is a hyperbole, but sadly not that large.
Not that I can answer for OP but as a personal anecdote; I've never been more productive than writing in Rust, it's a goddamn delight. Every codebase feels like it would've been my own and you can get to speed from 0 to 100 in no time.
Yeah, I’ve been working mainly in rust for the last few years. The compile time checks are so effective that run time bugs are rare. Like you can refactor half the codebase and not run the app for a week, and when you do it just works. I’ve never had that experience in other languages.
It's a bad reason. A lot of best practices are temporary blindnesses, comparable, in some sense, with supposed love to BASIC before or despite Dijkstra. So, yes, it's possible there is no good reason. Though I don't think it's the case here.
That has been the problem with unit and integration tests all the time. Especially for systems that tend to be distributed.
AI makes creating mock objects much easier in some cases, but it still creates a lot of busy work and makes configuration more difficult. At at this points it often is difficult configuration management that cause the issues in the first place. Putting everything in some container doesn't help either, on the contrary.
> But I don't think that represents a lot of what most of us are tasked with
Give me a list of all the libraries you work with that don't have some sort of "okay but not that bit" rule in the business logic, or "all of those function are f(src, dst) but the one you use most is f(dst,src) and we can't change it now".
I bet it's a very short list.
Really we need to scrap every piece of software ever written and start again from scratch with all these weirdities written down so we don't do it again, but we never will.
Scrapping everything wouldn't help. 15 years ago the project I'm on did that - for a billion dollars. We fixed the old mistakes but made plenty of new ones along the way. We are trying to fix those now and I can't help but wonder what new mistakes we are making the in 15 years we will regret.
Yeah, there were about 5 or 10 videos about this "complexity" and unpredictability of 3rd parties and wheels involved that AI doesn't control and even forget - small context window - in like past few weeks. I am sure you have seen at least one of them ;)
But it's true. AI is still super narrow and dumb. Don't understand basic prompts even.
Look at the computer games now - they still don't look real despite almost 30 years since Half-life 1 started the revolution - I would claim. Damn, I think I ran it on 166 Mhz computer on some lowest details even.
Yes, it's just better and better but still looking super uncanny - at least to me. And it's been basically 30 years of constant improvements. Heck, Roomba is going bankrupt.
I am not saying things don't improve but the hype and AI bubble is insane and the reality doesn't match the expectation and predictions at all.
> Formal verification will eventually lead to good, stable API design.
Why? Has it ever happened like this? Because to me it would seem that if the system verified to work, then it works no matter how API is shaped, so there is no incentive to change it to something better.
> Let's say formal verification could help to avoid some anti-patterns.
I'd still like to hear about the actual mechanism of this happening. Because I personally find it much easier to believe that the moment keeping the formal verification up to date becomes untenable for whatever reason (specs changing too fast, external APIs to use are too baroque, etc) people would rather say "okay, guess we ditch the formal verification and just keep maintaining the integration tests" instead of "let's change everything about the external world so we could keep our methodology".
> I am not an expert on this, but the worst API I've seen is those with hidden states.
> e.g. .toggle() API. Call it old number of times, it goes to one state, call it even number of times, it goes back.
This is literally a dumb light switch. If you have trouble proving that, starting from lights off, flicking a simple switch twice will still keep lights off then, well, I have bad news to tell you about the feasibility of using the formal methods for anything more complex than a dumb light switch. Because the rest of the world is a very complex and stateful place.
> (which itself is a state machine of some kind)
Yes? That's pretty much the raison d'être of the formal methods: for anything pure and immutable, normal intuition is usually more than enough; it's tracking the paths through enormous configuration spaces that our intuition has problem with. If the formal methods can't help with that with comparable amount of effort, then they are just not worth it.
At that point you create an entirely new API, fully versioned, and backwardly compatible (if you want it to be). The point the article is making is that AI, in theory, entirely removes the person from the coding process so there's no longer any need to maintain software. You can just make the part you're changing from scratch every time because the cost of writing bug-free code (effectively) goes to zero.
The theory is entirely correct. If a machine can write provably perfect code there is absolutely no reason to have people write code. The problem is that the 'If' is so big it can be seen from space.
All non-trivial specs, like the one for seL4, are hard to verify. Lots of that complexity comes from interacting with the rest of the world which is a huge shared mutable global state you can't afford to ignore.
Of course, you can declare that the world itself is inherently sinful and imperfect, and is not ready for your beautiful theories but seriously.
100% of state changes in business software is unknowable on a long horizon, and relies on thoroughly understanding business logic that is often fuzzy, not discrete and certain.
Formal verification does not gurantee business logic works as everybody expected, nor its future proof, however, it does provide a workable path towards:
Things can only happen if only you allow it to happen.
It other words, your software may come to a stage where it's no longer applicable, but it never crashes.
Formal verification had little adoption only because it costs 23x of your original code with "PhD-level training"
The reason it doesn't work is businesses change faster than you can model every detail AND keep it all up to date. Unless you have something tying your model directly to every business decision and transaction that happens, your model will never be accurate. And if we're talking about formal verification, that makes it useless.