Skip to content

Formally verifying Aiur programs #340

@arthurpaulino

Description

@arthurpaulino

Formal Evaluation Semantics for Aiur

Goal

Define an Aiur.eval function in Lean — a pure, definitional interpreter for Aiur
Terms — so that we can state and prove theorems about the behavior of Aiur programs.
This is not meant to be efficient; it is a specification against which the bytecode
compiler and the Rust runtime can be verified.

Overview

The core idea: given a Toplevel (the set of all function/datatype declarations) and
a Function, evaluating that function on concrete Value arguments produces a
Result containing:

  • An output Value (the return value).
  • A Prop-valued side condition accumulating all assertEq obligations.

This lets us state theorems like:

-- If `foo_fn` does `assert_eq!(x, 0); 2 * x`:
theorem foo_spec (top : Toplevel) (x : G) :
    Aiur.eval top `foo_fn [.field x] = .ok (.field (2 * x)) (x = 0)

Core Definitions

Values

Value is the semantic domain — what Aiur terms evaluate to. It mirrors Data but
is fully reduced:

inductive Value where
  | unit   : Value
  | field  : G → Value
  | tuple  : Array Value → Value
  | array  : Array Value → Value
  deriving Repr, BEq, Inhabited

Memory

Store/load operations require a heap model. A Memory is a simple map from addresses
to values:

structure Memory where
  cells : Array Value
  deriving Inhabited

def Memory.store (mem : Memory) (v : Value) : G × Memory :=
  let addr := G.ofNat mem.cells.size
  (addr, { cells := mem.cells.push v })

def Memory.load (mem : Memory) (addr : G) : Option Value :=
  mem.cells[addr.val.toNat]?

Result and EvalState

Evaluation is partial (recursion may diverge, pattern matches may be non-exhaustive)
and stateful (memory, IO buffers). We track:

structure EvalState where
  memory    : Memory
  ioBuffer  : IOBuffer
  fuel      : Nat          -- recursion bound

inductive Result where
  | ok   : Value → Prop → Result     -- value + accumulated assertions
  | fail : String → Result            -- stuck / out of fuel / match failure

The Prop component in ok is the conjunction of all assertEq conditions
encountered during evaluation. When there are no assertions, this is True.

Environment

An Env maps local variables to values. Function lookup goes through the Toplevel:

abbrev Env := Std.HashMap Local Value

def lookupFunction (top : Toplevel) (name : Global) : Option Function :=
  top.functions.find? fun f => f.name == name

The Evaluator

partial def eval (top : Toplevel) (env : Env) (state : EvalState) :
    Term → EvalState × Result

The key cases:

Term Semantics
.var x Look up x in env
.data (.field g) .ok (.field g) True
.data (.tuple ts) Evaluate each element, combine
.data (.array ts) Evaluate each element, combine
.ret t Evaluate t (marks escaping to caller)
.let pat val body Evaluate val, match against pat to extend env, evaluate body
.match t branches Evaluate t, try each (pat, body) in order
.app name args Evaluate args, look up function, substitute into body, recurse (decrement fuel)
.add a b Evaluate both to fields, return field addition (mod gSize)
.sub a b Field subtraction
.mul a b Field multiplication
.eqZero a Evaluate a to .field g, return .field (if g == 0 then 1 else 0)
.assertEq a b ret Evaluate a, b, evaluate ret, return ret's value with (a_val = b_val) ∧ ret_prop
.store t Evaluate t, push to memory, return pointer field
.load t Evaluate t to a field (address), look up in memory
.proj t i Evaluate t to a tuple, project component i
.get t i Evaluate t to an array, index at i
.slice t i j Evaluate t to an array, extract [i, j)
.set t i v Evaluate t to an array, evaluate v, replace at i

For function calls, fuel is decremented. If fuel reaches 0, the result is
.fail "out of fuel". This makes eval total.

Pattern Matching

def matchPattern (pat : Pattern) (v : Value) : Option (List (Local × Value))

This attempts to destructure v according to pat, returning bindings on success.
For Pattern.field g, it checks that v = .field g. For Pattern.var x, it always
succeeds with [(x, v)]. For constructors (Pattern.ref), it checks the tag and
recursively matches sub-patterns.

assertEq Accumulation

The crucial design point: assertEq does not fail evaluation. Instead, the equality
is accumulated as a Prop in the result:

| .assertEq a b ret => do
  let (state, a_val, a_prop) ← evalValue top env state a
  let (state, b_val, b_prop) ← evalValue top env state b
  let (state, ret_val, ret_prop) ← eval top env state ret
  let assertion := (a_val = b_val)
  (state, .ok ret_val (a_prop ∧ b_prop ∧ assertion ∧ ret_prop))

This mirrors how the circuit works: assertEq is a constraint, not a runtime check.
The evaluator produces the output regardless, and the proposition captures what must
hold for the execution to be valid.

Flattening to Array G

To connect eval to the existing test infrastructure (which works with Array G),
define:

def Value.flatten : Value → Array G
  | .unit      => #[]
  | .field g   => #[g]
  | .tuple vs  => vs.foldl (init := #[]) fun acc v => acc ++ v.flatten
  | .array vs  => vs.foldl (init := #[]) fun acc v => acc ++ v.flatten

And a top-level entry point:

def Aiur.run (top : Toplevel) (funcName : Global) (args : Array G)
    (fuel : Nat := 1000) : Option (Array G × Prop) :=
  let func ← lookupFunction top funcName
  let env := Env.ofList (func.inputs.zip (unflattenArgs func.inputs args))
  let state := { memory := default, ioBuffer := default, fuel }
  match eval top env state func.body with
  | (_, .ok val prop) => some (val.flatten, prop)
  | _ => none

Stating Theorems

With this setup, we can write specifications like:

-- A function that asserts its input is zero and doubles it
-- fn foo(x: G) -> G { assert_eq!(x, 0); x + x }
theorem foo_correct (top : Toplevel) (x : G)
    (h_func : lookupFunction top `foo = some foo_fn) :
    Aiur.run top `foo #[x] = some (#[x + x], x = (0 : G)) := by
  ...

-- A pure function with no assertions
-- fn sum(x: G, y: G) -> G { x + y }
theorem sum_correct (top : Toplevel) (x y : G)
    (h_func : lookupFunction top `sum = some sum_fn) :
    Aiur.run top `sum #[x, y] = some (#[x + y], True) := by
  ...

-- Conditional: fn abs_or_zero(x: G) -> G { match eq_zero(x) { 1 => 0, _ => x } }
theorem abs_or_zero_spec (top : Toplevel) (x : G) ... :
    Aiur.run top `abs_or_zero #[x] =
      if x = 0 then some (#[(0 : G)], True) else some (#[x], True) := by
  ...

Key Design Decisions

1. Prop-valued assertions, not runtime failure

assertEq accumulates a Prop rather than causing eval to fail. This is essential
because Aiur is a circuit language: the prover always produces an output, and the
constraints determine whether the output is valid. Separating the output from the
validity condition is what makes formal verification useful — you can reason about what
the circuit computes independently of whether the constraints are satisfied.

2. Fuel-based termination

Using a fuel : Nat parameter makes eval structurally recursive (total). Theorems
are stated with "sufficient fuel" hypotheses, or we prove that certain inputs require
bounded recursion.

3. Operating on Term (pre-simplification)

eval works on Term, the surface-level IR, not on TypedTerm or Bytecode. This
means it can be used to specify the semantics of Aiur programs as written. A separate
correctness theorem would relate eval on Term to execution of compiled Bytecode.

4. Memory as an array

The memory model is a simple append-only array. store appends and returns the index
as a G. load indexes into the array. This is adequate because Aiur memory is
append-only (no mutation of existing cells).

Verification Roadmap

Phase 1: Definitional interpreter (Aiur.eval)

  • Define Value, Memory, EvalState, Result, Env.
  • Implement eval, matchPattern, Value.flatten.
  • Implement Aiur.run as the top-level entry point.

Phase 2: Basic specifications

  • State and prove specs for simple functions: id, sum, prod.
  • State and prove specs involving assertEq.
  • State and prove specs involving match on field values.

Phase 3: Data structure operations

  • Specs for proj, get, slice, set.
  • Specs for tuple/array construction and destructuring.
  • Specs involving store/load.

Phase 4: Recursion and datatypes

  • Specs for recursive functions (factorial, fibonacci) with fuel analysis.
  • Specs for enum construction and pattern matching (Shape, Nat).
  • Mutual recursion (even/odd).

Phase 5: Compiler correctness

  • Define a relation between eval on Term and Bytecode.Toplevel.execute.
  • Prove that compilation preserves semantics: if eval produces (val, prop),
    then execute on the compiled bytecode produces val.flatten (assuming prop
    holds).

Phase 6: Byte and u32 operations

  • Specs for u8_add, u8_xor, u8_shift_left, u8_bit_decomposition, etc.
  • Specs for u32_less_than.
  • These require modeling the Goldilocks field arithmetic precisely.

Relation to CSLib

The approach is analogous to how CSLib provides
foundational definitions for formalizing computer science in Lean. Where CSLib defines
evaluation semantics for lambda calculus and other foundational languages, we define
evaluation semantics for Aiur — a domain-specific circuit language. The key parallel
is: define a clean denotational/operational semantics first, then prove properties
against it, then (optionally) prove that concrete implementations refine the semantics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions