Formally verifying Aiur programs

# Formal Evaluation Semantics for Aiur

## Goal

Define an `Aiur.eval` function in Lean — a pure, definitional interpreter for Aiur
`Term`s — so that we can state and prove theorems about the behavior of Aiur programs.
This is not meant to be efficient; it is a specification against which the bytecode
compiler and the Rust runtime can be verified.

## Overview

The core idea: given a `Toplevel` (the set of all function/datatype declarations) and
a `Function`, evaluating that function on concrete `Value` arguments produces a
`Result` containing:

- An output `Value` (the return value).
- A `Prop`-valued side condition accumulating all `assertEq` obligations.

This lets us state theorems like:

```
-- If `foo_fn` does `assert_eq!(x, 0); 2 * x`:
theorem foo_spec (top : Toplevel) (x : G) :
    Aiur.eval top `foo_fn [.field x] = .ok (.field (2 * x)) (x = 0)
```

## Core Definitions

### Values

`Value` is the semantic domain — what Aiur terms evaluate to. It mirrors `Data` but
is fully reduced:

```lean
inductive Value where
  | unit   : Value
  | field  : G → Value
  | tuple  : Array Value → Value
  | array  : Array Value → Value
  deriving Repr, BEq, Inhabited
```

### Memory

Store/load operations require a heap model. A `Memory` is a simple map from addresses
to values:

```lean
structure Memory where
  cells : Array Value
  deriving Inhabited

def Memory.store (mem : Memory) (v : Value) : G × Memory :=
  let addr := G.ofNat mem.cells.size
  (addr, { cells := mem.cells.push v })

def Memory.load (mem : Memory) (addr : G) : Option Value :=
  mem.cells[addr.val.toNat]?
```

### Result and EvalState

Evaluation is partial (recursion may diverge, pattern matches may be non-exhaustive)
and stateful (memory, IO buffers). We track:

```lean
structure EvalState where
  memory    : Memory
  ioBuffer  : IOBuffer
  fuel      : Nat          -- recursion bound

inductive Result where
  | ok   : Value → Prop → Result     -- value + accumulated assertions
  | fail : String → Result            -- stuck / out of fuel / match failure
```

The `Prop` component in `ok` is the conjunction of all `assertEq` conditions
encountered during evaluation. When there are no assertions, this is `True`.

### Environment

An `Env` maps local variables to values. Function lookup goes through the `Toplevel`:

```lean
abbrev Env := Std.HashMap Local Value

def lookupFunction (top : Toplevel) (name : Global) : Option Function :=
  top.functions.find? fun f => f.name == name
```

### The Evaluator

```lean
partial def eval (top : Toplevel) (env : Env) (state : EvalState) :
    Term → EvalState × Result
```

The key cases:

| Term | Semantics |
|------|-----------|
| `.var x` | Look up `x` in `env` |
| `.data (.field g)` | `.ok (.field g) True` |
| `.data (.tuple ts)` | Evaluate each element, combine |
| `.data (.array ts)` | Evaluate each element, combine |
| `.ret t` | Evaluate `t` (marks escaping to caller) |
| `.let pat val body` | Evaluate `val`, match against `pat` to extend `env`, evaluate `body` |
| `.match t branches` | Evaluate `t`, try each `(pat, body)` in order |
| `.app name args` | Evaluate args, look up function, substitute into body, recurse (decrement fuel) |
| `.add a b` | Evaluate both to fields, return field addition (mod `gSize`) |
| `.sub a b` | Field subtraction |
| `.mul a b` | Field multiplication |
| `.eqZero a` | Evaluate `a` to `.field g`, return `.field (if g == 0 then 1 else 0)` |
| `.assertEq a b ret` | Evaluate `a`, `b`, evaluate `ret`, return `ret`'s value with `(a_val = b_val) ∧ ret_prop` |
| `.store t` | Evaluate `t`, push to memory, return pointer field |
| `.load t` | Evaluate `t` to a field (address), look up in memory |
| `.proj t i` | Evaluate `t` to a tuple, project component `i` |
| `.get t i` | Evaluate `t` to an array, index at `i` |
| `.slice t i j` | Evaluate `t` to an array, extract `[i, j)` |
| `.set t i v` | Evaluate `t` to an array, evaluate `v`, replace at `i` |

For function calls, `fuel` is decremented. If fuel reaches 0, the result is
`.fail "out of fuel"`. This makes `eval` total.

### Pattern Matching

```lean
def matchPattern (pat : Pattern) (v : Value) : Option (List (Local × Value))
```

This attempts to destructure `v` according to `pat`, returning bindings on success.
For `Pattern.field g`, it checks that `v = .field g`. For `Pattern.var x`, it always
succeeds with `[(x, v)]`. For constructors (`Pattern.ref`), it checks the tag and
recursively matches sub-patterns.

### assertEq Accumulation

The crucial design point: `assertEq` does *not* fail evaluation. Instead, the equality
is accumulated as a `Prop` in the result:

```lean
| .assertEq a b ret => do
  let (state, a_val, a_prop) ← evalValue top env state a
  let (state, b_val, b_prop) ← evalValue top env state b
  let (state, ret_val, ret_prop) ← eval top env state ret
  let assertion := (a_val = b_val)
  (state, .ok ret_val (a_prop ∧ b_prop ∧ assertion ∧ ret_prop))
```

This mirrors how the circuit works: `assertEq` is a constraint, not a runtime check.
The evaluator produces the output regardless, and the proposition captures what must
hold for the execution to be *valid*.

## Flattening to `Array G`

To connect `eval` to the existing test infrastructure (which works with `Array G`),
define:

```lean
def Value.flatten : Value → Array G
  | .unit      => #[]
  | .field g   => #[g]
  | .tuple vs  => vs.foldl (init := #[]) fun acc v => acc ++ v.flatten
  | .array vs  => vs.foldl (init := #[]) fun acc v => acc ++ v.flatten
```

And a top-level entry point:

```lean
def Aiur.run (top : Toplevel) (funcName : Global) (args : Array G)
    (fuel : Nat := 1000) : Option (Array G × Prop) :=
  let func ← lookupFunction top funcName
  let env := Env.ofList (func.inputs.zip (unflattenArgs func.inputs args))
  let state := { memory := default, ioBuffer := default, fuel }
  match eval top env state func.body with
  | (_, .ok val prop) => some (val.flatten, prop)
  | _ => none
```

## Stating Theorems

With this setup, we can write specifications like:

```lean
-- A function that asserts its input is zero and doubles it
-- fn foo(x: G) -> G { assert_eq!(x, 0); x + x }
theorem foo_correct (top : Toplevel) (x : G)
    (h_func : lookupFunction top `foo = some foo_fn) :
    Aiur.run top `foo #[x] = some (#[x + x], x = (0 : G)) := by
  ...

-- A pure function with no assertions
-- fn sum(x: G, y: G) -> G { x + y }
theorem sum_correct (top : Toplevel) (x y : G)
    (h_func : lookupFunction top `sum = some sum_fn) :
    Aiur.run top `sum #[x, y] = some (#[x + y], True) := by
  ...

-- Conditional: fn abs_or_zero(x: G) -> G { match eq_zero(x) { 1 => 0, _ => x } }
theorem abs_or_zero_spec (top : Toplevel) (x : G) ... :
    Aiur.run top `abs_or_zero #[x] =
      if x = 0 then some (#[(0 : G)], True) else some (#[x], True) := by
  ...
```

## Key Design Decisions

### 1. Prop-valued assertions, not runtime failure

`assertEq` accumulates a `Prop` rather than causing `eval` to fail. This is essential
because Aiur is a circuit language: the prover always produces an output, and the
constraints determine whether the output is *valid*. Separating the output from the
validity condition is what makes formal verification useful — you can reason about what
the circuit computes independently of whether the constraints are satisfied.

### 2. Fuel-based termination

Using a `fuel : Nat` parameter makes `eval` structurally recursive (total). Theorems
are stated with "sufficient fuel" hypotheses, or we prove that certain inputs require
bounded recursion.

### 3. Operating on `Term` (pre-simplification)

`eval` works on `Term`, the surface-level IR, not on `TypedTerm` or `Bytecode`. This
means it can be used to specify the semantics of Aiur programs as written. A separate
correctness theorem would relate `eval` on `Term` to execution of compiled `Bytecode`.

### 4. Memory as an array

The memory model is a simple append-only array. `store` appends and returns the index
as a `G`. `load` indexes into the array. This is adequate because Aiur memory is
append-only (no mutation of existing cells).

## Verification Roadmap

### Phase 1: Definitional interpreter (`Aiur.eval`)
- Define `Value`, `Memory`, `EvalState`, `Result`, `Env`.
- Implement `eval`, `matchPattern`, `Value.flatten`.
- Implement `Aiur.run` as the top-level entry point.

### Phase 2: Basic specifications
- State and prove specs for simple functions: `id`, `sum`, `prod`.
- State and prove specs involving `assertEq`.
- State and prove specs involving `match` on field values.

### Phase 3: Data structure operations
- Specs for `proj`, `get`, `slice`, `set`.
- Specs for tuple/array construction and destructuring.
- Specs involving `store`/`load`.

### Phase 4: Recursion and datatypes
- Specs for recursive functions (`factorial`, `fibonacci`) with fuel analysis.
- Specs for enum construction and pattern matching (`Shape`, `Nat`).
- Mutual recursion (`even`/`odd`).

### Phase 5: Compiler correctness
- Define a relation between `eval` on `Term` and `Bytecode.Toplevel.execute`.
- Prove that compilation preserves semantics: if `eval` produces `(val, prop)`,
  then `execute` on the compiled bytecode produces `val.flatten` (assuming `prop`
  holds).

### Phase 6: Byte and u32 operations
- Specs for `u8_add`, `u8_xor`, `u8_shift_left`, `u8_bit_decomposition`, etc.
- Specs for `u32_less_than`.
- These require modeling the Goldilocks field arithmetic precisely.

## Relation to CSLib

The approach is analogous to how [CSLib](https://github.com/leanprover/cslib) provides
foundational definitions for formalizing computer science in Lean. Where CSLib defines
evaluation semantics for lambda calculus and other foundational languages, we define
evaluation semantics for Aiur — a domain-specific circuit language. The key parallel
is: define a clean denotational/operational semantics first, then prove properties
against it, then (optionally) prove that concrete implementations refine the semantics.


Term	Semantics
`.var x`	Look up `x` in `env`
`.data (.field g)`	`.ok (.field g) True`
`.data (.tuple ts)`	Evaluate each element, combine
`.data (.array ts)`	Evaluate each element, combine
`.ret t`	Evaluate `t` (marks escaping to caller)
`.let pat val body`	Evaluate `val`, match against `pat` to extend `env`, evaluate `body`
`.match t branches`	Evaluate `t`, try each `(pat, body)` in order
`.app name args`	Evaluate args, look up function, substitute into body, recurse (decrement fuel)
`.add a b`	Evaluate both to fields, return field addition (mod `gSize`)
`.sub a b`	Field subtraction
`.mul a b`	Field multiplication
`.eqZero a`	Evaluate `a` to `.field g`, return `.field (if g == 0 then 1 else 0)`
`.assertEq a b ret`	Evaluate `a`, `b`, evaluate `ret`, return `ret`'s value with `(a_val = b_val) ∧ ret_prop`
`.store t`	Evaluate `t`, push to memory, return pointer field
`.load t`	Evaluate `t` to a field (address), look up in memory
`.proj t i`	Evaluate `t` to a tuple, project component `i`
`.get t i`	Evaluate `t` to an array, index at `i`
`.slice t i j`	Evaluate `t` to an array, extract `[i, j)`
`.set t i v`	Evaluate `t` to an array, evaluate `v`, replace at `i`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formally verifying Aiur programs #340

Formal Evaluation Semantics for Aiur

Goal

Overview

Core Definitions

Values

Memory

Result and EvalState

Environment

The Evaluator

Pattern Matching

assertEq Accumulation

Flattening to `Array G`

Stating Theorems

Key Design Decisions

1. Prop-valued assertions, not runtime failure

2. Fuel-based termination

3. Operating on `Term` (pre-simplification)

4. Memory as an array

Verification Roadmap

Phase 1: Definitional interpreter (`Aiur.eval`)

Phase 2: Basic specifications

Phase 3: Data structure operations

Phase 4: Recursion and datatypes

Phase 5: Compiler correctness

Phase 6: Byte and u32 operations

Relation to CSLib

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Formally verifying Aiur programs #340

Description

Formal Evaluation Semantics for Aiur

Goal

Overview

Core Definitions

Values

Memory

Result and EvalState

Environment

The Evaluator

Pattern Matching

assertEq Accumulation

Flattening to Array G

Stating Theorems

Key Design Decisions

1. Prop-valued assertions, not runtime failure

2. Fuel-based termination

3. Operating on Term (pre-simplification)

4. Memory as an array

Verification Roadmap

Phase 1: Definitional interpreter (Aiur.eval)

Phase 2: Basic specifications

Phase 3: Data structure operations

Phase 4: Recursion and datatypes

Phase 5: Compiler correctness

Phase 6: Byte and u32 operations

Relation to CSLib

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Flattening to `Array G`

3. Operating on `Term` (pre-simplification)

Phase 1: Definitional interpreter (`Aiur.eval`)