This repository contains a proof-of-concept implementation that uses symbolic execution and SMT solving to detect type errors in programs with complex type hierarchies, generics, and union types.
- Overview
- Quick Start
- Project Structure
- Intermediate Representation (IR)
- Algorithm Overview
- SMT Solver Design Choices
- Capabilities and Limitations
- Development
- Additional Documentation
The system analyzes programs by:
- Parsing source code (Python/Java) into a common intermediate representation (IR)
- Symbolically executing functions to explore all possible execution paths
- Encoding type constraints as SMT formulas
- Checking constraint satisfiability to detect type errors
The analysis detects unsafe casts, type inconsistencies, and violations of generic variance rules.
Install uv (if not already installed):
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"For other installation methods, consult the uv documentation.
Clone and setup the project:
# Clone the repository
git clone <repository-url>
cd master_project_poc
# Install dependencies with uv
uv sync
# Verify installation by running tests
uv run pytestTry the example templates to see the analysis in action:
# Run Python template
uv run benchmark examples/template.py
# Run Java template
uv run benchmark examples/template.javaThe templates demonstrate basic type error detection. You can modify them using these supported features:
Python:
- Classes with inheritance (
class Dog(Animal)) - Type annotations (required:
x: Animal,def foo() -> int) - Union types (
Animal | None),Anytype fromtyping - Generics (new syntax:
class List[T]orTypeVar/Genericfromtyping) - Type casts using
cast(Dog, x)fromtyping - Type checks:
isinstance(x, Dog) - if/else conditionals, assertions
Java:
- Classes and interfaces with inheritance (
extends,implements) - Generics (
class List<T>) - Arrays (content never used, only type-checked)
- Type casts
(Dog) x, instanceof checks - if/else conditionals, assertions
See Capabilities and Limitations for details.
src/master_project_poc/
├── frontend/ # Language frontends (Python, Java)
│ ├── python.py # Python → IR
│ └── java.py # Java → IR
├── ir/ # Intermediate representation definitions
│ ├── cfg.py # Control flow graph structures
│ ├── type_def.py # Type definitions
│ └── decls.py # Class/interface declarations
├── engine/ # Symbolic execution engine
│ └── se_engine.py # Path exploration and state management
├── smt/ # SMT encoding and solving
│ └── solver.py # Type constraint encoding with Z3
└── benchmarks/ # Example programs demonstrating capabilities
The IR is designed to capture common type system features from Python and Java.
# Identifiers
Id ::= BlockId | LocalId | TempId | FuncId | UserDefinedTypeId | GenericParamId
(all contain: id: int)
UserDefinedTypeId ::= ClassId | InterfaceId
# Source positions
SrcPos ::= line: Int?
col: Int?
SrcSpan ::= start: SrcPos
end: SrcPos
file: String?
# Types
Type ::= BoolType | IntType | NullType | AnyType | VoidType
| RefType | UnionType | GenericParamType
BoolType ::= "boolean"
IntType ::= "int"
NullType ::= "null"
AnyType ::= "any"
VoidType ::= "void"
RefType ::= name: UserDefinedTypeId
generic_params: {GenericParamId: Type}? # None for raw types, non-empty for instantiated generics
UnionType ::= types: (Type)+ # Union of 1+ types (normalized, no nested unions)
GenericParamType ::= name: GenericParamId # Reference to generic parameter in scope
# Generic Parameters
Variance ::= COVARIANT | CONTRAVARIANT | INVARIANT
GenericParamDecl ::= id: GenericParamId
name: String
variance: Variance
# Declarations
Program ::= user_defined_type: {UserDefinedTypeId: UserDefinedTypeDecl}
functions: {FuncId: Function}
UserDefinedTypeDecl ::= ClassDecl | InterfaceDecl
ClassDecl ::= id: ClassId
name: String
supertypes: (RefType)*
generic_params: {GenericParamId: GenericParamDecl}?
is_instantiable: Bool
is_final: Bool
methods: (Function)*
InterfaceDecl ::= id: InterfaceId
name: String
supertypes: (RefType)*
generic_params: {GenericParamId: GenericParamDecl}?
# Functions & Locals
Local ::= id: LocalId
name: String
type: Type
Function ::= id: FuncId
name: String
params: (LocalId)*
locals: {LocalId: Local}
blocks: {BlockId: BasicBlock}
entry: BlockId
return_type: Type
# Values
Value ::= Const | LocalId | TempId
Const ::= value: (Int | Bool | Null)
type: Type
# Operations
# +, -, *, //, %, ==, !=, <, <=, >, >=
BinOpKind ::= Add | Sub | Mul | FloorDiv | Mod
| Eq | Ne | Lt | Le | Gt | Ge
# ! (logical not)
UnaryOpKind ::= Not
# Instructions
Instruction ::= (ResultInstruction | NoResultInstruction) & Positioned
ResultInstruction ::= New | Cast | Is | BinOp | UnaryOp
NoResultInstruction ::= Assign | Assert
New ::= dst: TempId
type: RefType
Cast ::= dst: TempId
value: Value
target_type: Type
Is ::= dst: TempId # Runtime type check: dst = value instanceof target_type
value: Value
target_type: Type
BinOp ::= dst: TempId
op: BinOpKind
left: Value
right: Value
UnaryOp ::= dst: TempId
op: UnaryOpKind
value: Value
Assign ::= dst: LocalId
value: Value
Assert ::= condition: Value # Adds condition to path constraints
# Terminators
Terminator ::= (Jump | Branch | Return) & Positioned
Jump ::= target: BlockId
Branch ::= true_target: (Value, BlockId) # (condition, then_block)
false_target: (Value, BlockId) # (neg_condition, else_block)
Return ::= value: Value # Always present (void functions use VoidType Const)
# Basic Blocks
BasicBlock ::= id: BlockId
instructions: (Instruction)*
terminator: Terminator
The SE engine explores all feasible execution paths through a function. Each execution state consists of:
- Environment: Maps variables to their symbolic values and types
- Path constraints: Accumulated logical conditions that must hold on this path
Type errors are detected by checking if required subtyping relationships are entailed by the path constraints (verified using the SMT solver).
Algorithm: SymbolicExecution(function)
Input: A function with control flow graph
Output: Set of detected type errors
1. Initialize worklist ← [(entry_block, initial_environment, initial_path_constraints)]
2. While worklist is not empty do:
a. Pop (block, env, path) from worklist
b. If path constraints are not satisfiable then
Continue to next iteration // Prune infeasible path
c. For each instruction in block do:
i. If instruction is New then
Update env with new reference type
ii. If instruction is Cast then
Check that value_type ⊆ target_type under path
If not entailed, report type error
Update env with value and its type
iii. If instruction is BinOp or UnaryOp then
Compute symbolic result expression
Update env with result and its type
iv. If instruction is Assign then
Check that value_type ⊆ variable_declared_type under path
If not entailed, report type error
Create fresh symbolic type variable τ_new
Add constraint τ_new ⊆ value_type to path
Update env with value and τ_new
v. If instruction is Assert then
Check that condition is entailed by path
If not entailed, report assertion failure
d. Handle block terminator:
i. If Jump(target) then
Add (target, env, path) to worklist
ii. If Branch(condition, true_target, false_target) then
Add (true_target, env, path ∪ {condition}) to worklist
Add (false_target, env, path ∪ {¬condition}) to worklist
iii. If Return then
Path exploration complete
The SMT solver encodes type constraints using Z3. It provides two key operations:
Algorithm: CheckSat(path_constraints)
Input: Path constraints Φ
Output: Whether Φ is satisfiable
result ← Z3.check(Φ)
Return result is SAT
Algorithm: CheckEntailed(path_constraints, query_constraint)
Input: Path constraints Φ, query constraint φ
Output: Whether Φ ⊨ φ
result ← Z3.check(Φ ∧ ¬φ)
If result is UNSAT then
Return True // φ is entailed
Else
Return False // Counter-example found
Constraint Encoding:
For SubtypeOf(τ₁, τ₂, variance):
- Select mapping: subtype_mapping (covariant/invariant) or parent_transitive (contravariant)
- If τ₁ is symbolic variable: create symbolic variables for encountered generic parameters
- Union on left: (T₁|T₂) ⊆ S ≡ (T₁ ⊆ S) ∧ (T₂ ⊆ S)
- Union on right: T ⊆ (S₁|S₂) ≡ (T ⊆ S₁) ∨ (T ⊆ S₂)
- Both unions: conjunction of disjunctions for each left member
- Build disjunction over all types in mapping[τ₂] with generic substitution
For EqualType(τ₁, τ₂):
- If τ₁ is GenericParamType with symbolic base: create symbolic variable, constrain to τ₂
- Both unions with same length: peel and recurse element-wise
- τ₂ is union, τ₁ is not: encode union as integer, compare symbolically
- Otherwise: τ₁ = τ₂ as symbolic equality
The solver uses a quantifier-free approach by explicitly enumerating all subtype relationships during type hierarchy construction. This avoids quantifiers in the SMT formulas, which improves solver performance and ensures decidability.
Type Hierarchy Construction (4 phases):
- parent_mapping: Direct supertype relationships (classes → superclass + interfaces)
- positive_edges: Reachable types from each class (considering generic instantiations)
- parent_transitive: Transitive closure of parent relationships
- subtype_mapping: All concrete instantiated subtypes for each type
The solver uses an open-world assumption to handle types that are not defined in the current program but could be added later (e.g., client code extending library interfaces). This allows analyzing libraries independently without knowing all possible subtypes that client code might introduce.
Unknown subtypes: For each symbolic variable with a subtype constraint (e.g., x ⊆ Interface), the solver creates symbolic type references representing potential unknown subtypes. This ensures branches that depend on specific subtypes are explored even when those types don't exist in the current program.
Symbolic variables for generics: When an unknown type is a subtype of a generic type, the solver creates symbolic variables for the generic parameters. For example, if symbolic variable x has constraint x ⊆ List<T>, a symbolic variable x_T is created and constrained by the actual instantiation of T in the destination type.
Current limitation: Only one level of nested generics is fully supported for unknown types. For types like List<List<T>>, the inner T is not linked properly. Additionally, variance (covariant/contravariant) is not handled correctly for unknown generics - they are treated as invariant.
When checking τ₁ ⊆ τ₂ with generics:
- Index
mapping[τ₂]to get potential subtypes (may contain uninstantiated GenericParamTypes) - For each potential subtype:
- If τ₁ is concrete: substitute its actual type arguments into the potential subtype's generic parameters
- If τ₁ is symbolic: create symbolic variables for generic parameters (e.g.,
x_T), use recursively against τ₂'s generic instantiation
- Apply variance rules:
- Covariant (
+T):S ⊆ Trequired (uses subtype_mapping) - Contravariant (
-T):T ⊆ Srequired (uses parent_transitive) - Invariant:
S = Trequired
- Covariant (
Union types are encoded using logical connectives based on position:
(A|B) ⊆ Cbecomes(A ⊆ C) ∧ (B ⊆ C)- all members must be subtypesA ⊆ (B|C)becomes(A ⊆ B) ∨ (A ⊆ C)- at least one must hold- For equality with unions: encoded as unique integer values for symbolic comparison
The solver handles multiple constraint types:
- SubtypeConstraint: Subtyping with variance (used for casts, assignments,
instanceofchecks) - EqualTypeConstraint: Type equality (used for symbolic variables in assignments)
- ExprRefConstraint: Integer/boolean value constraints from conditionals and expressions
When equality constraints are added (e.g., τ₁ = τ₂), the solver maintains an alias map to remember that these symbolic variables represent the same type. This improves constraint propagation and helps the solver find relationships between variables.
- Type hierarchies: Classes, interfaces, multiple interface inheritance
- Generics: Parameterized types with variance annotations (covariant, contravariant, invariant)
- Union types: Type unions with proper subtyping semantics
- Type operations: Casts,
instanceofchecks, assignments - Symbolic execution: Complete path exploration with branch conditions
- Error detection: Invalid casts, type mismatches, assertion failures
- Intraprocedural analysis only: Functions analyzed independently, no cross-function calls
- No fields or methods: Classes track only inheritance relationships
- Limited array support: Java arrays are parsed as covariant generic
Array<T>types. Array element access creates a new value of typeT, and array element assignment is type-checked like a local variable assignment. Python arrays not yet supported. - Contravariant generics with union types: When checking
x ⊆ Class<-T>where the contravariant parameterTis instantiated with a union type (e.g.,A | B), the encoding requires checkingA | B ⊆ x_T(union supertype of symbolic). This case is not currently supported. - Manual type annotations: Assumes types already inferred (like after javac/mypy)
- Non-SSA IR: Variables can be reassigned (handled via versioned symbolic variables)
- Path explosion: Infeasible paths are pruned (unsatisfiable constraints), but all feasible paths are explored without abstraction or path merging
Run type checker:
uv run mypy srcRun tests:
uv run pytestRun tests with coverage:
uv run pytest --cov --cov-report=term-missingRun linter:
uv run ruff check
uv run ruff formatRun benchmarks:
# See benchmarks README for detailed options
uv run benchmark src/master_project_poc/benchmarks/java/codeforces_array_sort.java- Benchmarks - Example programs demonstrating capabilities
- Frontend Tests - Test cases for Python and Java frontends
- Engine Tests - Symbolic execution test cases showing expected behavior for various type scenarios