Skip to content

david-schroeter/master_project_poc

Repository files navigation

Symbolic Type System Proof-of-Concept

This repository contains a proof-of-concept implementation that uses symbolic execution and SMT solving to detect type errors in programs with complex type hierarchies, generics, and union types.

Table of Contents

Overview

The system analyzes programs by:

  1. Parsing source code (Python/Java) into a common intermediate representation (IR)
  2. Symbolically executing functions to explore all possible execution paths
  3. Encoding type constraints as SMT formulas
  4. Checking constraint satisfiability to detect type errors

The analysis detects unsafe casts, type inconsistencies, and violations of generic variance rules.

Quick Start

Setup

Install uv (if not already installed):

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

For other installation methods, consult the uv documentation.

Clone and setup the project:

# Clone the repository
git clone <repository-url>
cd master_project_poc

# Install dependencies with uv
uv sync

# Verify installation by running tests
uv run pytest

Try It Yourself

Try the example templates to see the analysis in action:

# Run Python template
uv run benchmark examples/template.py

# Run Java template
uv run benchmark examples/template.java

The templates demonstrate basic type error detection. You can modify them using these supported features:

Python:

  • Classes with inheritance (class Dog(Animal))
  • Type annotations (required: x: Animal, def foo() -> int)
  • Union types (Animal | None), Any type from typing
  • Generics (new syntax: class List[T] or TypeVar/Generic from typing)
  • Type casts using cast(Dog, x) from typing
  • Type checks: isinstance(x, Dog)
  • if/else conditionals, assertions

Java:

  • Classes and interfaces with inheritance (extends, implements)
  • Generics (class List<T>)
  • Arrays (content never used, only type-checked)
  • Type casts (Dog) x, instanceof checks
  • if/else conditionals, assertions

See Capabilities and Limitations for details.

Project Structure

src/master_project_poc/
├── frontend/          # Language frontends (Python, Java)
│   ├── python.py      # Python → IR
│   └── java.py        # Java → IR
├── ir/                # Intermediate representation definitions
│   ├── cfg.py         # Control flow graph structures
│   ├── type_def.py    # Type definitions
│   └── decls.py       # Class/interface declarations
├── engine/            # Symbolic execution engine
│   └── se_engine.py   # Path exploration and state management
├── smt/               # SMT encoding and solving
│   └── solver.py      # Type constraint encoding with Z3
└── benchmarks/        # Example programs demonstrating capabilities

Intermediate Representation (IR)

The IR is designed to capture common type system features from Python and Java.

Complete IR Grammar

# Identifiers

Id                  ::= BlockId | LocalId | TempId | FuncId | UserDefinedTypeId | GenericParamId
                        (all contain: id: int)

UserDefinedTypeId   ::= ClassId | InterfaceId

# Source positions

SrcPos              ::= line: Int?
                        col:  Int?

SrcSpan             ::= start: SrcPos
                        end:   SrcPos
                        file:  String?

# Types
Type                ::= BoolType | IntType | NullType | AnyType | VoidType
                        | RefType | UnionType | GenericParamType

BoolType            ::= "boolean"
IntType             ::= "int"
NullType            ::= "null"
AnyType             ::= "any"
VoidType            ::= "void"

RefType             ::= name: UserDefinedTypeId
                        generic_params: {GenericParamId: Type}?        # None for raw types, non-empty for instantiated generics

UnionType           ::= types: (Type)+                   # Union of 1+ types (normalized, no nested unions)

GenericParamType    ::= name: GenericParamId         # Reference to generic parameter in scope

# Generic Parameters

Variance            ::= COVARIANT | CONTRAVARIANT | INVARIANT

GenericParamDecl    ::= id: GenericParamId
                        name: String
                        variance: Variance

# Declarations

Program             ::= user_defined_type: {UserDefinedTypeId: UserDefinedTypeDecl}
                        functions: {FuncId: Function}

UserDefinedTypeDecl ::= ClassDecl | InterfaceDecl

ClassDecl           ::= id: ClassId
                        name: String
                        supertypes: (RefType)*
                        generic_params: {GenericParamId: GenericParamDecl}?
                        is_instantiable: Bool
                        is_final: Bool
                        methods: (Function)*

InterfaceDecl       ::= id: InterfaceId
                        name: String
                        supertypes: (RefType)*
                        generic_params: {GenericParamId: GenericParamDecl}?

# Functions & Locals

Local               ::= id: LocalId
                        name: String
                        type: Type

Function            ::= id: FuncId
                        name: String
                        params: (LocalId)*
                        locals: {LocalId: Local}
                        blocks: {BlockId: BasicBlock}
                        entry: BlockId
                        return_type: Type

# Values

Value               ::= Const | LocalId | TempId

Const               ::= value: (Int | Bool | Null)
                        type:  Type

# Operations

# +, -, *, //, %, ==, !=, <, <=, >, >=
BinOpKind           ::= Add | Sub | Mul | FloorDiv | Mod
                        | Eq | Ne | Lt | Le | Gt | Ge

# ! (logical not)
UnaryOpKind         ::= Not

# Instructions

Instruction         ::= (ResultInstruction | NoResultInstruction) & Positioned

ResultInstruction   ::= New | Cast | Is | BinOp | UnaryOp
NoResultInstruction ::= Assign | Assert

New                 ::= dst: TempId
                        type: RefType

Cast                ::= dst: TempId
                        value: Value
                        target_type: Type

Is                  ::= dst: TempId              # Runtime type check: dst = value instanceof target_type
                        value: Value
                        target_type: Type

BinOp               ::= dst: TempId
                        op: BinOpKind
                        left: Value
                        right: Value

UnaryOp             ::= dst: TempId
                        op: UnaryOpKind
                        value: Value

Assign              ::= dst: LocalId
                        value: Value

Assert              ::= condition: Value         # Adds condition to path constraints

# Terminators

Terminator          ::= (Jump | Branch | Return) & Positioned

Jump                ::= target: BlockId

Branch              ::= true_target: (Value, BlockId)    # (condition, then_block)
                        false_target: (Value, BlockId)   # (neg_condition, else_block)

Return              ::= value: Value                    # Always present (void functions use VoidType Const)

# Basic Blocks

BasicBlock          ::= id: BlockId
                        instructions: (Instruction)*
                        terminator: Terminator

Algorithm Overview

Symbolic Execution Engine

The SE engine explores all feasible execution paths through a function. Each execution state consists of:

  • Environment: Maps variables to their symbolic values and types
  • Path constraints: Accumulated logical conditions that must hold on this path

Type errors are detected by checking if required subtyping relationships are entailed by the path constraints (verified using the SMT solver).

Algorithm: SymbolicExecution(function)
Input: A function with control flow graph
Output: Set of detected type errors

1. Initialize worklist ← [(entry_block, initial_environment, initial_path_constraints)]

2. While worklist is not empty do:
   a. Pop (block, env, path) from worklist

   b. If path constraints are not satisfiable then
      Continue to next iteration  // Prune infeasible path

   c. For each instruction in block do:
      i.   If instruction is New then
           Update env with new reference type

      ii.  If instruction is Cast then
           Check that value_type ⊆ target_type under path
           If not entailed, report type error
           Update env with value and its type

      iii. If instruction is BinOp or UnaryOp then
           Compute symbolic result expression
           Update env with result and its type

      iv.  If instruction is Assign then
           Check that value_type ⊆ variable_declared_type under path
           If not entailed, report type error
           Create fresh symbolic type variable τ_new
           Add constraint τ_new ⊆ value_type to path
           Update env with value and τ_new

      v.   If instruction is Assert then
           Check that condition is entailed by path
           If not entailed, report assertion failure

   d. Handle block terminator:
      i.   If Jump(target) then
           Add (target, env, path) to worklist

      ii.  If Branch(condition, true_target, false_target) then
           Add (true_target, env, path ∪ {condition}) to worklist
           Add (false_target, env, path ∪ {¬condition}) to worklist

      iii. If Return then
           Path exploration complete

SMT Solver

The SMT solver encodes type constraints using Z3. It provides two key operations:

Algorithm: CheckSat(path_constraints)
Input: Path constraints Φ
Output: Whether Φ is satisfiable

result ← Z3.check(Φ)
Return result is SAT
Algorithm: CheckEntailed(path_constraints, query_constraint)
Input: Path constraints Φ, query constraint φ
Output: Whether Φ ⊨ φ

result ← Z3.check(Φ ∧ ¬φ)
If result is UNSAT then
   Return True  // φ is entailed
Else
   Return False  // Counter-example found

Constraint Encoding:

For SubtypeOf(τ₁, τ₂, variance):
- Select mapping: subtype_mapping (covariant/invariant) or parent_transitive (contravariant)
- If τ₁ is symbolic variable: create symbolic variables for encountered generic parameters
- Union on left: (T₁|T₂) ⊆ S ≡ (T₁ ⊆ S) ∧ (T₂ ⊆ S)
- Union on right: T ⊆ (S₁|S₂) ≡ (T ⊆ S₁) ∨ (T ⊆ S₂)
- Both unions: conjunction of disjunctions for each left member
- Build disjunction over all types in mapping[τ₂] with generic substitution
For EqualType(τ₁, τ₂):
- If τ₁ is GenericParamType with symbolic base: create symbolic variable, constrain to τ₂
- Both unions with same length: peel and recurse element-wise
- τ₂ is union, τ₁ is not: encode union as integer, compare symbolically
- Otherwise: τ₁ = τ₂ as symbolic equality

SMT Solver Design Choices

Quantifier-Free Encoding

The solver uses a quantifier-free approach by explicitly enumerating all subtype relationships during type hierarchy construction. This avoids quantifiers in the SMT formulas, which improves solver performance and ensures decidability.

Type Hierarchy Construction (4 phases):

  1. parent_mapping: Direct supertype relationships (classes → superclass + interfaces)
  2. positive_edges: Reachable types from each class (considering generic instantiations)
  3. parent_transitive: Transitive closure of parent relationships
  4. subtype_mapping: All concrete instantiated subtypes for each type

Open-World Assumption

The solver uses an open-world assumption to handle types that are not defined in the current program but could be added later (e.g., client code extending library interfaces). This allows analyzing libraries independently without knowing all possible subtypes that client code might introduce.

Unknown subtypes: For each symbolic variable with a subtype constraint (e.g., x ⊆ Interface), the solver creates symbolic type references representing potential unknown subtypes. This ensures branches that depend on specific subtypes are explored even when those types don't exist in the current program.

Symbolic variables for generics: When an unknown type is a subtype of a generic type, the solver creates symbolic variables for the generic parameters. For example, if symbolic variable x has constraint x ⊆ List<T>, a symbolic variable x_T is created and constrained by the actual instantiation of T in the destination type.

Current limitation: Only one level of nested generics is fully supported for unknown types. For types like List<List<T>>, the inner T is not linked properly. Additionally, variance (covariant/contravariant) is not handled correctly for unknown generics - they are treated as invariant.

Generic Substitution and Variance

When checking τ₁ ⊆ τ₂ with generics:

  1. Index mapping[τ₂] to get potential subtypes (may contain uninstantiated GenericParamTypes)
  2. For each potential subtype:
    • If τ₁ is concrete: substitute its actual type arguments into the potential subtype's generic parameters
    • If τ₁ is symbolic: create symbolic variables for generic parameters (e.g., x_T), use recursively against τ₂'s generic instantiation
  3. Apply variance rules:
    • Covariant (+T): S ⊆ T required (uses subtype_mapping)
    • Contravariant (-T): T ⊆ S required (uses parent_transitive)
    • Invariant: S = T required

Union Type Encoding

Union types are encoded using logical connectives based on position:

  • (A|B) ⊆ C becomes (A ⊆ C) ∧ (B ⊆ C) - all members must be subtypes
  • A ⊆ (B|C) becomes (A ⊆ B) ∨ (A ⊆ C) - at least one must hold
  • For equality with unions: encoded as unique integer values for symbolic comparison

Constraint Types

The solver handles multiple constraint types:

  • SubtypeConstraint: Subtyping with variance (used for casts, assignments, instanceof checks)
  • EqualTypeConstraint: Type equality (used for symbolic variables in assignments)
  • ExprRefConstraint: Integer/boolean value constraints from conditionals and expressions

Alias Map for Symbolic Variables

When equality constraints are added (e.g., τ₁ = τ₂), the solver maintains an alias map to remember that these symbolic variables represent the same type. This improves constraint propagation and helps the solver find relationships between variables.

Capabilities and Limitations

Supported Features

  • Type hierarchies: Classes, interfaces, multiple interface inheritance
  • Generics: Parameterized types with variance annotations (covariant, contravariant, invariant)
  • Union types: Type unions with proper subtyping semantics
  • Type operations: Casts, instanceof checks, assignments
  • Symbolic execution: Complete path exploration with branch conditions
  • Error detection: Invalid casts, type mismatches, assertion failures

Current Limitations

  • Intraprocedural analysis only: Functions analyzed independently, no cross-function calls
  • No fields or methods: Classes track only inheritance relationships
  • Limited array support: Java arrays are parsed as covariant generic Array<T> types. Array element access creates a new value of type T, and array element assignment is type-checked like a local variable assignment. Python arrays not yet supported.
  • Contravariant generics with union types: When checking x ⊆ Class<-T> where the contravariant parameter T is instantiated with a union type (e.g., A | B), the encoding requires checking A | B ⊆ x_T (union supertype of symbolic). This case is not currently supported.
  • Manual type annotations: Assumes types already inferred (like after javac/mypy)
  • Non-SSA IR: Variables can be reassigned (handled via versioned symbolic variables)
  • Path explosion: Infeasible paths are pruned (unsatisfiable constraints), but all feasible paths are explored without abstraction or path merging

Development

Commands

Run type checker:

uv run mypy src

Run tests:

uv run pytest

Run tests with coverage:

uv run pytest --cov --cov-report=term-missing

Run linter:

uv run ruff check
uv run ruff format

Run benchmarks:

# See benchmarks README for detailed options
uv run benchmark src/master_project_poc/benchmarks/java/codeforces_array_sort.java

Additional Documentation

  • Benchmarks - Example programs demonstrating capabilities
  • Frontend Tests - Test cases for Python and Java frontends
  • Engine Tests - Symbolic execution test cases showing expected behavior for various type scenarios

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors