More

igornotarobot · 2025-12-17T11:37:26 1765971446

It probably will, but not the way we all imagine. What we see now is an attempt to recycle the interactive provers that took decades to develop. Writing code, experimenting with new ideas and getting feedback has always been a very slow process in academia. Getting accepted at a top peer-reviewed conference takes months and even years. The essential knowledge is hidden inside big corps that only promote their "products" and rarely give the knowledge back.

LLMs enable code bootstrapping and experimentation faster not only for the vibe coders, but also for the researchers, many of them are not really good coders, btw. It may well be that we will see new wild verification tools soon that come as a result of quick iteration with LLMs.

For example, I recently wrote an experimental distributed bug finder for TLA+ with Claude in about three weeks. A couple of years ago that effort would require three months and a team of three people.

igornotarobot · 2025-12-17T11:05:55 1765969555

Afaik, formal verification worked well for hardware because most of the things in hardware were deterministic and could be captured precisely. Most of the software these days is concurrent and distributed. Does the LLM + RL approach work for concurrent and distributed code? We barely know how to do systematic simulation there.

igornotarobot · 2025-12-17T10:31:34 1765967494

This sounds amazing! What kind of systems take you a few hours to a few days now? Just curious whether it works in a niche (like sequential code), or does it work for concurrent and distributed systems as well?

igornotarobot · 2025-12-17T09:41:28 1765964488

> TLA+ is not a silver bullet, and like all temporal logic, has constraints. > > You really have to be able to reduce your models to: “at some point in the future, this will happen," or "it will always be true from now on”

I think people get confused by the word "temporal" in the name of TLA+. Yes, it has temporal operators. If you throw them away, TLA+ (minus the temporal operators) would be still extremely useful for specifying the behavior of concurrent and distributed systems. I have been using TLA+ for writing specifications of distributed algorithms (e.g., distributed consensus) and checking them for about 6 years now. The question of liveness comes the last, and even then, the standard temporal logics are barely suitable for expressing liveness under partial synchrony. The value of temporal properties in TLA+ is overrated.

igornotarobot · 2025-12-17T09:28:21 1765963701

> TLA+ can specify anything that could be specified in mathematics.

You are talking about the logic of TLA+, that is, its mathematical definition. No tool for TLA+ can handle all of mathematics at the moment. The language was designed for specifying systems, not all of mathematics.

igornotarobot · 2025-12-17T09:21:11 1765963271

TLA+ is just a language for writing specifications (syntax + semantics). If you want to prove anything about it, at various degrees of confidence and effort, there are three tools:

- TLAPS is the interactive proof system that can automate some proof steps by delegating to SMT solvers: https://proofs.tlapl.us/doc/web/content/Home.html

- Apalache is the symbolic model checker that delegates verification to Z3. It can prove properties without executing anything, or rather, executing specs symbolically. For instance, it can do proofs via inductive invariants but only for bounded data structures and unbounded integers. https://apalache-mc.org/

- Finally, TLC is an enumerative model checker and simulator. It simply produces states and enumerates them. So it terminates only if the specification produces a finite number of states. It may sound like executing your specification, but it is a bit smarter, e.g., when checking invariants it will never visit the same state twice. This gives TLC the ability to reason about infinite executions. Confusingly, TLC does not have its own page, as it was the first working tool for TLA+. Many people believe that TLA+ is TLC: https://github.com/tlaplus/tlaplus

igornotarobot · 2025-12-13T11:54:49 1765626889

Producing positive and negative examples is exactly where model checkers shine. I always write "falsy" invariants to produce examples of the specification reaching interesting control states for at least one input. After that, I think about the system boundaries and where it should break. Then, the model checker shows that it indeed breaks there.

Having a specification does not mean that one should not touch it. It is just a different level of experimental thinking.

igornotarobot · 2025-06-03T14:14:31 1748960071

It should be possible to write protocol specifications in Lean, e.g., this is a recent case study on specifying two-phase commit in Lean [1] and proving its safety [2].

However, there are no model checkers for Lean. Currently, you either have to write a full proof by hand, with some assistance from LLMs, or rely on random simulation, similar to property-based testing.

[1] https://protocols-made-fun.com/lean/2025/04/25/lean-two-phas... [2] https://protocols-made-fun.com/lean/2025/05/10/lean-two-phas...

igornotarobot · on Aug 30, 2024

When you say peer reviews, do you mean academic publications or testimonials? I imagine it would be difficult to publish a paper at an academic conference proposing an alternative syntax for anything, even if it were better.

igornotarobot · on Aug 30, 2024

I believe this is really the tragedy of formal verification tools. Everybody wants a tool as robust as a compiler. At the same time, nobody wants to invest into development of such tools. Microsoft Research 20 years ago was probably an exception to that. The other companies wish to immediately hide these tools and the benchmarks behind the IP and closed source. As a result, we have early stage MVPs that are developed by 1-3 people.