Part 1 of an explainer series on formally verifying DeFi in Lean

You Can Formally Verify Your Smart Contract Today. Here's How.

For years, the honest answer to "should I formally verify this smart contract?" was "yes, but the effort puts most teams off." Automatic tools — SMT solvers, bounded model checkers, the kind of thing I built halmos on top of — can attempt some properties cheaply but stall on others. Manual proof assistants — Lean, Coq, Isabelle — handle anything you can think about but used to cost multi-man-years on sizeable projects.

That trade-off has quietly collapsed. AI models can now write the proofs, and Lean checks them. When I revisited this experiment a few weeks ago, the AI just got there — the same setup six months earlier hit walls I had to push through by hand. This is the first of an explainer series on doing this in practice. We'll start with a simple ERC20-like token, move to a simple ERC4626-like vault, and watch verification surface the design flaw behind the classic donation attack.

(Vitalik's recent piece makes the broader case — both why formal verification matters and why it's suddenly tractable. He covers the high-level case; this series does hands-on smart contract verification, with code you can run.)

All Lean sources are in the repository. Clone it, point your AI agent at the files, and you can both replicate the proofs and have it walk you through anything you don't follow.

Lean is a proof checker. AI is a proof generator.

The key thing about Lean: it does not write proofs. It checks them.1 Given a theorem statement and a proof script, Lean decides whether the script is a valid derivation. If yes, the theorem holds. If no, you get a precise error message about where the script failed.

Combine this with an AI that has become genuinely good at writing proof scripts and you have the best of both worlds: the AI's productivity (hundreds of lines of proof in seconds) with a hallucination-proof check (Lean refuses to accept anything wrong).

That last point matters more than it sounds. The standard pushback against using AI for security-critical work is "what if it lies?" In this setup, it can lie all it wants — the lie won't compile. The only thing you have to read carefully is the theorem statement: what property are you trying to prove? That's the part to get right.

To be clear about my workflow: I do not read most of the proofs in this post. I check that each theorem says what I want it to say, and then I let Lean confirm the AI's work. That's the entire loop.

Warm-up: a simple ERC20-like token

To keep things simple, we focus on the core accounting logic. No approve, no allowance map, no separate transfer — just a single transferFrom that moves tokens between any two accounts.

Then the state is just a balance map2 and a total-supply counter:

structure State where
  balances    : Addr →₀ ℕ
  totalSupply : ℕ

The three operations — transferFrom, mint, burn. Here is transferFrom:

noncomputable def transferFrom (s : State) (sender receiver : Addr) (amount : ℕ) :
    Option State :=
  if s.balances sender < amount then none
  else
    let b := s.balances.update sender (s.balances sender - amount)
    some ⟨b.update receiver (b receiver + amount), s.totalSupply⟩

Option State is the return type: the operation either succeeds (some s') or fails (none). No exceptions, no reverts, just the post-state or its absence.3

The headline correctness property is the ledger invariant: total supply equals the sum of all balances. In Lean:

def Invariant (s : State) : Prop :=
  s.totalSupply = s.balances.sum fun _ v => v

That's it — one line.

That right-hand side, s.balances.sum fun _ v => v, is the kind of thing that automatic tools struggle with — it's a sum over the support of a Finsupp, and getting an SMT solver to reason about it can require ghost variables and manual hints. In Lean it's just a normal function and the invariant is just a normal proposition.

We bundle the three operations into a step function and prove preservation once:

inductive Action where
  | transferFrom (sender receiver : Addr) (amount : ℕ)
  | mint         (recipient : Addr) (amount : ℕ)
  | burn         (holder : Addr) (amount : ℕ)

noncomputable def step (a : Action) (s : State) : Option State := ...

theorem step_preserves_invariant {a : Action} {s s' : State}
    (h : Invariant s) (hstep : step a s = some s') : Invariant s' := ...
Show full step and step_preserves_invariant bodies
noncomputable def step (a : Action) (s : State) : Option State :=
  match a with
  | .transferFrom u v amt => transferFrom s u v amt
  | .mint r amt           => some (mint s r amt)
  | .burn h amt           => burn s h amt

theorem step_preserves_invariant {a : Action} {s s' : State}
    (h : Invariant s) (hstep : step a s = some s') : Invariant s' := by
  cases a with
  | transferFrom u v amt =>
    simp only [step] at hstep
    exact transferFrom_preserves_invariant h hstep
  | mint r amt =>
    simp only [step, Option.some.injEq] at hstep
    subst hstep
    exact mint_preserves_invariant h
  | burn b amt =>
    simp only [step] at hstep
    exact burn_preserves_invariant h hstep

The body of step_preserves_invariant is about sixty lines of dense Lean tactics, mostly bookkeeping over Finsupp.sum_update_add. I asked my AI agent — a vanilla Claude Code agent running Opus 4.7 Extra High — to write it. I checked the statement carefully. I ran lake build. It passed.

Done. The token model preserves its ledger invariant under every action, and we never have to revisit it.

The real test: a simple ERC4626-like vault

An ERC-4626 vault is a contract that accepts deposits in an underlying ERC-20 (the asset) and issues its own ERC-20 (the vault token4) in proportion to the depositor's stake. Users can later trade vault tokens back for assets. The conversion rate between the two — the vault-token price — is the obvious safety-critical quantity: it should never decrease as a consequence of user actions, or someone is being short-changed for someone else's benefit.

The vault state is just two tokens:5

structure State where
  vaultToken : ERC20.State
  assetToken : ERC20.State

The vault has four user operations — deposit, mint, withdraw, redeem — plus two external events: profit (assets flow into the vault without minting vault tokens — yield, accidental transfer, or a deliberate donation) and loss (assets flow out without burning vault tokens — strategy loss, bad debt). All six go through the same step machinery.

Show Action and step
inductive Action where
  | deposit  (user  : Addr) (amount : ℕ)
  | mint     (user  : Addr) (tokens : ℕ)
  | withdraw (user  : Addr) (amount : ℕ)
  | redeem   (user  : Addr) (tokens : ℕ)
  | profit   (amount : ℕ)
  | loss     (amount : ℕ)

noncomputable def step (a : Action) (s : State) : Option State :=
  match a with
  | .deposit u amt  => (deposit s u amt).map Prod.fst
  | .mint u tk      => (mint s u tk).map Prod.fst
  | .withdraw u amt => (withdraw s u amt).map Prod.fst
  | .redeem u tk    => (redeem s u tk).map Prod.fst
  | .profit amt     => some (profit s amt)
  | .loss amt       => loss s amt

The property we want, written without division to stay in ℕ:

def VaultTokenPriceLE (s s' : State) : Prop :=
  vaultAssets s * s'.vaultToken.totalSupply
  ≤ vaultAssets s' * s.vaultToken.totalSupply

Read as: the post-state's vault-token price (totalAssets' / totalSupply') is at least the pre-state's, cross-multiplied.

One thing to settle up front: loss is the only action that can violate this. A strategy loss genuinely reduces the assets backing each vault token — that's not a bug; that's the loss itself. The interesting question is whether everything else — every user action, plus profits — preserves the price. That's the theorem we want, with the noLoss precondition stated explicitly:

theorem step_preserves_VaultTokenPriceLE
    {a : Action} {s s' : State}
    (hnl : a.noLoss)        -- noLoss: every action except `loss`
    (hstep : step a s = some s') :
    VaultTokenPriceLE s s' := ...

First attempt: the obvious formula

The natural conversion formula multiplies by the current ratio:

noncomputable def vaultTokenFor (s : State) (assets : ℕ) : ℕ :=
  if s.vaultToken.totalSupply = 0 then assets   -- empty vault: 1:1
  else assets * s.vaultToken.totalSupply / vaultAssets s

The if branch handles the empty-vault edge case — when no vault tokens have been issued, there's no ratio to multiply by, so the first depositor gets a 1:1 rate. Routine edge-case handling — exactly what the early ERC-4626 reference implementations did.

I asked the AI to prove step_preserves_VaultTokenPriceLE for this model (under the loss-excluded form). After some thrashing, it gave up — and handed back a state pair where the property fails. A short trace it found, transcribed:

step vault balance B vault-token supply T
initial 0 0
deposit (alice, 7) 7 7
profit 1 8 7
withdraw (alice, 7) 1 0 ← orphan
deposit (alice, 5) 6 5

Stare at the last two rows. After the withdraw, the vault holds 1 asset token and has issued zero vault tokens. (The withdraw burns ceil(7·7 / 8) = 7 vault tokens but only releases 7 assets, leaving 1 behind — a ceiling-rounding artifact.) The vault-token price at this point is 1 / 0, informally infinite. A fresh depositor then puts in 5 assets, hits the T = 0 branch, and walks out with 5 vault tokens priced at the 1:1 fallback rate — pricing the previously-infinite vault token at a finite ratio. The price has just dropped.

What this trace exposes: the 1:1 fallback was only meant for the vault's initial creation, when there's no prior conversion rate to honor. Once the vault drains and refills, the same branch fires again — overwriting whatever rate the vault's history had implied. And in any near-empty state, the same fragility lets a small donation swing the price drastically — the primitive behind the classic ERC-4626 donation attack.

That moment — verification surfacing a hidden flaw as a counter-example to a theorem you wanted to be true — is the part of formal methods I find addictive. The proof effort isn't just a stamp of correctness; it's a magnifying glass on the parts of the design you didn't think hard enough about.

The fix: virtual offsets

The repair is well-known: add small constants — virtual vault tokens vT and virtual assets vA — as offsets to both numerator and denominator. With both strictly positive, the if T = 0 branch becomes unreachable. The formula becomes:

noncomputable def vaultTokenFor (s : State) (assets : ℕ) : ℕ :=
  assets * (s.vaultToken.totalSupply + virtualTokens)
    / (vaultAssets s + virtualAssets)

— no branch. The corresponding assetFor, mintCost, and withdrawCost get the same offset treatment.

The orphan trace closes immediately: T + vT > 0 always, so the 1:1 fallback never fires. The deeper benefit: when the vault drains and refills, the conversion rate from just before the emptying is preserved through the empty state — fresh deposits are priced at the retained rate, not at a 1:1 reset.

Virtual offsets also raise the cost of the donation attack. With vT > 0, a fraction vT / (T + vT) of any donation is captured by a "phantom holder" that never withdraws — enough, even at vT = 1, that the attacker can't recoup their donation.

I asked the AI to re-attempt the proof with this formula. It produced about 150 lines of Lean. lake build. Green. step_preserves_VaultTokenPriceLE holds for every non-loss action, and the companion loss_decreases_VaultTokenPriceLE confirms loss is the only action that can violate it (and inevitably does, as expected).

What this doesn't prove (yet)

We've verified a Lean model of the token and the vault. We have not verified that the actual deployed contract bytecode behaves the same way — the model and the actual execution code are two different artifacts.

That gap is real, and we'll get to it: closing it needs some additional tooling that I'll cover in a later part.

In the meantime, model-level verification carries weight on its own. It's a precision instrument for design-phase exploration — you can sketch out alternative formulations, prove invariants against each, and watch one of them fall over before you write any Solidity. It's a forcing function on understanding: the act of writing down the right property usually surfaces the parts of the design you'd been hand-waving past. And the core logic of most contracts is small enough that verifying it at the model level — even before tackling the bytecode bridge — already eliminates a meaningful class of bugs.

Try it

The Lean sources for this post — Token.lean, VaultNaive.lean (the failed first attempt), and Vault.lean (the working virtual-offset version) — live in the simple-tokens-and-vaults/ directory of the repository. Each is self-contained, and organized so you only need to read the state definitions, the operations, and the headline theorem statements — the helper lemmas and proof bodies are skippable.

After a few rounds of practice, any developer can build the muscle to write specs like VaultTokenPriceLE. The proofs themselves are now the AI's job. The hard part has moved from labor to taste: what is the right property to ask for?

That's a much better problem to have.

Notes

  1. Some Lean tactics (omega, decide, aesop, the various search-based ones) feel like they're generating proofs — you write one line, the proof appears. They're really running proof search and emitting kernel-level terms, which the kernel then checks. The "check, not generate" rule still holds; the tactics are just convenient frontends.
  2. Addr →₀ ℕ is mathlib's notation for a finitely-supported function from Addr to — a function that is zero on all but finitely many inputs. The right model for a balance map: most accounts hold nothing, and only finitely many have ever held a positive balance.
  3. In larger Lean developments, failures are often represented using exception monads carrying explicit error information rather than plain Option; we use Option here to keep the presentation accessible to readers unfamiliar with monads.
  4. ERC-4626's standard text calls the vault's own token a share; we say vault token throughout for clarity.
  5. Strictly, the asset token is a separate ERC-20 contract — not part of the vault's on-chain storage. We bundle it into the vault's State here so the model can express all the cross-contract accounting in one place, reading the vault's asset holding straight off the asset token's ledger (vaultAssets s := s.assetToken.balances vault). Real vaults may also track a separate totalAssets counter for bookkeeping; we collapse that into the asset token's ledger for the same reason — to keep the model as simple as possible.