Mu has a comprehensive type system. It is close to the machine level, but also has reference types for exact garbage collection.
In the Mu IR, a type is created by a (possibly recursive) combination of type constructors.
By convention, types are written in lower cases. Parameters to types are written
in angular brackets < >.
A Mu type defines a set where a data value, or value when unambiguous, is one of its elements.
Both SSA variables and the Mu memory can hold values in this type system. Some restrictions can limit what type a variable or a memory location can hold.
A type constructor represents an abstract type. A concrete type is created by applying a type constructor and supplying necessary parameters. The following type constructors are available in Mu:
- int < length >
- float
- double
- uptr < T >
- ufuncptr < sig >
- struct < T1 T2 ... >
- hybrid < F1 F2 ... V >
- array < T length >
- vector < T length >
- void
- ref < T >
- iref < T >
- weakref < T >
- tagref64
- funcref < sig >
- threadref
- stackref
- framecursorref
- irnoderef
For C programmers: In Mu IR, types cannot be "inlined", i.e. all types referenced by other definitions (such as other types, constants, globals, functions, ...) must be defined at top-level. For example:
.typedef @refi64 = ref<int<64>> // WRONG. Cannot write int<64> inside. .typedef @i64 = int<64> .typedef @refi64 = ref<@i64> // Right. %sum = FADD <double> %a %b // WRONG. "double" is a type constructor, not a type .typedef @double = double %sum = FADD <@double> %a %b // Right.
Parameters of a type are in the angular brackets. They can be integer literals, types and function signatures. In the text form, the latter two are global names (See uvm-ir.rest).
There are several kinds of types.
floatanddoubleare floating point types.refandweakrefare object referenct types.ref,irefandweakrefare reference types.funcref,threadref,stackref,framecursorrefandirnoderefare opaque reference types.- Reference types and opaque reference types are general reference types.
int,float,double, pointer types, general reference types andtagref64are scalar types.struct,hybrid,arrayandvectorare composite types.voidis neither a scalar type nor a composite type.
hybridis the only variable-length type. All other types are fixed-length types.uptrandufuncptrare pointer types.int, pointer types,ref,irefand opaque reference types are EQ-comparable types.int,irefand pointer types are ULT-comparable types.ref<T>is the strong variant ofweakref<T>;weakref<T>is the weak variant ofref<T>. All other types are the strong variant or weak variant of themselves.
A member of a composite type T is either a field of T if T is a struct, or an element of T if T is an array or a vector, or a field in the fixed part or any element in the variable part if T is a hybrid. A component of type T is either itself or a member of any component of T.
NOTE: This means a component is anything in a type, including itself and any arbitrarily nested members.
The type parameter T in uptr<T> and the return type and all parameter
types of sig in ufuncptr<sig> must be native-safe. It is defined as
following:
void,int<n>,floatanddoubleare native-safe.struct<T1 T2 ...>,array<T n>,vec<T n>andhybrid<F1 F2 ... V>are native-safe if all of their type arguments T1, T2, ..., T, F1, F2, ..., V are native-safe.uptr<T>andufuncptr<sig>are native-safe if T and the return type and all parameter types in sig are native-safe. Otherwise theuptror theufuncptrtype is not well-formed.- All other types are not native-safe. (Specifically, they are all general
reference types as well as
struct,arrayorhybridthat contains them.)
int < length >
- length
- integer literal: The length of the integer in bits.
int is an integer type of length bits.
int is neutral to signedness. Negative numbers are represented in the 2's
complement notation where the highest bit is the sign bit.
int<1> is a Boolean type, in which case 1 means true and 0 means false.
NOTE: The signedness of aninttype is determined by the operations rather than the type. For example,UDIVtreats both operands as unsigned numbers,SDIVtreats both operands as signed numbers andASHRtreats the first operand as signed and the second operand as unsigned.
NOTE: Althoughint<1>is required andint<6>andint<52>are also required whentagref64is implemented, they should not be part of any in-memory structure because their correspondingLOADandSTOREoperations are not required for Mu implementations.int<1>is meant to represent register flags, such as the result of comparison and some overflow or carry flags.int<6>andint<52>are supposed to be used transiently when packing or unpacking atagref64value.
For LLVM users: these types are directly borrowed from LLVM.
Example:
.typedef @i1 = int<1> .typedef @i8 = int<8> .typedef @i16 = int<16> .typedef @i32 = int<32> .typedef @i64 = int<64>
float
double
float and double are the IEEE754 single-precision and double-precision
floating point number type, respectively.
For LLVM users: these types are directly borrowed from LLVM.
Example:
.typedef @float = float .typedef @double = double
uptr < T >
T- type: The type of the referent.
ufuncptr < sig >
sig- function signature: The signature of the pointed function.
uptr and ufuncptr are (untraced) pointer types which are represented by
the integral address of the referent. They are part of the (unsafe) native
interface. The "u" in their names stand for "untraced". Their values are not
affected by the garbage collection, even if their addresses are obtained from
pinning heap objects which are later unpinned.
uptr is the data pointer type. It points to a region in the native address
space which represents the data type T.
ufuncptr is the function pointer type. It points to a native function whose
signature is sig.
The type parameter T and both the return types and the parameter types of sig must be native-safe. It is implementation-defined whether multiple return values are allowed for a particular calling convention.
For LLVM users:
uptr<T>is the counterpart of pointer typesT*.ufuncptr<sig>is the counterpart of function pointersR (P1 P2 ...)*. ThePTRCASTinstruction can cast between different pointer types as well as integer types.For C users: Similar to LLVM,
uptrandufuncptrare the equivalent of C pointers to objects and functions, respectively. However, since Mu interfaces with the native world at the ABI level rather than the C programming language level, pointers are defined as addresses and casting between pointers and integers has semantics.
Example:
.typedef @i32 = int<32> // int .typedef @i32_p = uptr<@i32> // int* // ssize_t write(int fildes, const void *buf, size_t nbyte); // See man (2) write. .typedef @void = void // void .typedef @void_p = uptr<@void> // void* .typedef @size_t = int<64> // size_t, ssize_t .funcsig @write_s = (@i32 @void_p @size_t) -> (@size_t) .typedef @write_fp = ufuncptr<@write_s> // @write_fp may point to the native function "write". // typedef void (*sig_t) (int); // sig_t signal(int sig, sig_t func); // See man (3) signal. .funcsig @sig_s = (@i32) -> () .typedef @sig_t = ufuncptr<@sig_s> .funcsig @signal_s = (@i32 @sig_t) -> (@sig_t) .typedef @signal_fp = ufuncptr<@signal_s> // @signal_fp may point to the native function "signal".
struct < T1 T2 ... >
- T1, T2, ...
- type: The type of fields.
A struct is a Cartesian product type of several types. T1, T2, ... are
its field types. A struct must have at least one member. struct
members cannot be void.
NOTE: For C programs: C does not allow empty structures, either, but many programmers create empty structures in practice. C++ does allow empty classes. g++ treats empty classes as having onecharelement. In Mu, if it is desired to allocate an empty unit in the heap, the appropriate type isvoid.
A struct cannot have itself as a component.
NOTE: If it could, the struct would be infinitely large. However, a struct may contain a reference to the same struct type, since the size of references is not dictated by the thing it points to. For example:
.typedef @foo = struct <@i32 @foo> // WRONG. @foo is infinitely big. .typedef @foo = struct <@i32 @fooref> .typedef @fooref = ref<@foo> // Okay. It is a linked list.
struct cannot be the type of an SSA variable if any of its field types
cannot be the type of an SSA variable.
NOTE: For example, astructwith aweakreffield cannot be the type of an SSA variable. However, there can be references to such structs.
For LLVM users: This is the same as LLVM's structure type, except structures
with a "flexible array member" (a 0-length array as the last element)
corresponds to the hybrid type in Mu.
Example:
.typedef @byte = int<8> .typedef @short = int<16> .typedef @int = int<32> .typedef @f = float .typedef @d = double .typedef @struct1 = struct<@byte @short @int @f @d> .typedef @struct2 = struct<@f @f @struct1 @d @d> // nesting structs
hybrid < F1 F2 ... V >
- F1, F2, ...
- list of types: The types in the fixed part
- V
- type: The type of the elements of the variable part
A hybrid is a combination of a fixed-length prefix, i.e. its fixed part, and
a variable-length array suffix, i.e. its variable part, whose length is
decided at allocation time. F1 F2 ... are the types of fields in the fixed
part. V is the type of the elements of the variable part.
NOTE: This is intended to play the part of "struct with flexible array
member" in C99, i.e. struct { F1 f1; F2 f2; ... V v[]; }.
The fixed part may contain 0 fields. In this case, the fixed part is empty, and
the variable part is in the beginning of this hybrid memory location, like a
variable-length array without a header. The variable part cannot be omitted.
Neither any fixed-part field nor V can be void.
hybrid cannot be the type of any SSA variable.
NOTE: There can be references to hybrids.
hybrid is the only type in Mu whose length is determined at allocation site
rather than determined by the type itself.
hybrid cannot be contained in any other composite types, including other
hybrids.
NOTE: Since the length of hybrids are only known at allocation time, allowing embedding hybrid members will make the size of other types variable. In Mu's design, hybrid is the only type whose length is determined at allocation site.
For C programmers: Just like
struct { F f; V v[]; }cannot be embedded in other types,hybridcannot, either. However, pointers/references to hybrids are allowed.
Example:
.typedef @byte = int<8> .typedef @long = int<64> .typedef @double = double .typedef @struct1 = struct<@long @long @long> .typedef @hybrid1 = hybrid<@long @byte> // one int<64> followed by many int<8> .typedef @hybrid2 = hybrid<@long @long @long @double> // three int<64>, followed by many double .typedef @hybrid3 = hybrid<@struct1 @double> // similar to @hybrid2, but using struct as the header .typedef @hybrid4 = hybrid<@byte> // no fixed-part header. Just many int<8>.
array < T length >
- T
- type: The type of elements.
- length
- integer literal: The number of elements.
An array is a sequence of values of the same type. T is its element
type, i.e. the type of its elements, and length is the length of the array.
T must not be void. An array must have at least one element.
An array cannot have itself as a component.
It is not recommended to have SSA variables of array type.
NOTE: The most useful feature of arrays is indexing by a variable index. But an SSA variable has more in common with registers than memory, and SSA variables are designed to be allocated in registers when possible. Mu implementations, as supposed to be minimal, may not be able to implement indexing more efficiently than storing the array to the memory and load back and element.
Using arrays as an SSA variable is most useful when passing a
structvalue (not pointer) that contains an array member to a C function that requires such a parameter, although such C functions are, themselves, very rare.
For LLVM users: Like LLVM arrays, a Mu array must have a size, but cannot
have size 0. The closest counterpart of the "variable length array" (VLA)
type in C is the hybrid type.
Example:
.typedef @u8 = int<8> .typedef @real = double .typedef @cmpx = struct<@real @real> .typedef @array1 = array<@u8 4096> // array of 4096 bytes .typedef @array2 = array<@real 100> // array of 100 doubles .typedef @array3 = array<@cmpx 16> // array of 16 structs .typedef @array4 = array<@array2 1024> // array of 1024 nested arrays
vector < T length >
T- type: The type of elements.
length- integer literal: The number of elements.
vector is the vector type for single-instruction multiple-data (SIMD)
operations. A vector value is a packed value of multiple values of the same
type. T is the type of its elements and length is the number of elements.
T cannot be void. length must be at least one.
It is allowed to have SSA variables of vector types.
Only some primitive element types, such as int<32>, float and
double, are required for implementations. If the
implementation allows other types, then any vector cannot directly or indirectly
contain itself as a member.
For LLVM users: This is the counterpart of the LLVM vector type.
Example:
.typedef @i32 = int<32> .typedef @float = float .typedef @double = double .typedef @vector1 = vector<@i32 4> .typedef @vector2 = vector<@float 4> .typedef @vector3 = vector<@double 2> .typedef @vector4 = vector<@double 4>
void
The void type has no value.
It can only be used as the type of allocation units that do not store any
values. This allows allocating void in the heap/stack/global memory.
Particularly, the NEW instruction with the type void creates a new empty
heap object which is not the same as any others. This is similar to the new
Object() expression in Java. ref<void>, iref<void>, weakref<void>
and uptr<void> are also allowed, which can refer/point to "anything".
ref < T >
iref < T >
weakref < T >
- T
- type: The type of referent.
ref is an object reference type. A ref value is a strong reference to a
heap objects.
iref is an internal reference type. An iref value is an internal
reference to a memory location.
weakref is a weak object reference type. It is the weak variant of ref.
A memory location of weakref holds a weak reference to a heap object and can
be clear to NULL by the garbage collector when there is no strong references
to the object the weakref value refers to.
NOTE: There is no weak internal reference.
The type parameter T is the referent type, which is the type of the heap
object or memory location its value refers to.
All reference types can have NULL value which does not refer to any heap
object or memory location.
Weakref can only be the type of a memory location, not an SSA variable. When a
weakref location is loaded from, the result is a ref to the same object;
when a ref is stored to a weakref location, the location holds a
weakref to that object.
NOTE: Allowing SSA variables to hold weak references may cause many problems. The semantic allows the GC to change it to
NULLat any time as long as the GC decides the referent object is no longer reachable. For this reason, it is impossible to guarantee a weak reference isNULLbefore accessing. Consider the following program:%entry(): %notnull = NE <@RefT> %weakref @NULLREF // Just at this moment, GC changed %weakref to NULL BRANCH2 %notnull %bb_cont(%weakref) %bb_abnormal(...) %bb_cont(<@WeakRefT> %weakref): %val = LOAD <@T> %weakref // null reference access ...GC may clear the weak reference right after the program decided it is not
NULL.Requiring an explicit conversion from
weakref<T>toref<T>is not very useful. In that case, the only operation allowed forweakref<T>is to convert toref<T>.So letting this conversion happen implicitly during memory access is a natural choice, though not intuitive at all. In Mu's conceptual model, a memory load is like an IO operation: it does not simply "get" the value (such as an object reference) in the memory, but is a communication with the memory system and queries the global state. So it is natural for a load operation to return a different value each time executed.
For LLVM users: there is no equivalence in LLVM. Mu guarantees that all references are identified both in the heap and in the stack and are subject to garbage collection. The closest counterpart in LLVM is the pointer type. Mu does not encourage the use of pointers, though pointer types will be introduced in Mu in the future.
Example:
.typedef @i8 = int<8> .typedef @i16 = int<16> .typedef @i32 = int<32> .typedef @float = float .typedef @double = double .typedef @some_struct = struct<@i32 @i16 @i8 @double @float> .typedef @some_array = array<@i8 100> .typedef @ref1 = ref<@i32> .typedef @ref2 = ref<@some_struct> .typedef @ref3 = ref<@some_array> .typedef @iref1 = iref<@i32> .typedef @iref2 = iref<@some_struct> .typedef @iref3 = iref<@some_array> .typedef @weakref1 = weakref<@i32> .typedef @weakref2 = weakref<@some_struct> .typedef @weakref3 = weakref<@some_array>
tagref64
tagref64 is a union type of double, int<52> and struct<ref<void>
int<6>. It occupies 64 bits. A tagref64 value holds both a state which
identifies the type it is currently representing and a value of that type.
NOTE: When atagref64contains an object reference, it can hold anint<6>together as a user-defined tag. It is useful to store type information.
When a tagref64 represents a double NaN value, it does not preserve the
bit-wise representation of the NaN.
NOTE: This type is intended to reuse the NaN space of the IEEE754 double value to multiplex with integers and object references. For this reason, when storing NaN values, it will still be NaN, but may not have the same bit representation.
NOTE: This type is only available on some architectures including x86-64 with 48-bit addresses.
funcref < sig >
- sig
- function signature: The signature of the referred function.
funcref is a function reference type. It is an opaque reference to a Mu
function and is not interchangeable with reference types. sig is the signature
of the function it refers to.
A NULL value of a funcref type does not refer to any function.
NOTE: The value of afuncrefmay refer to a function that is declared but not defined. The value of afuncreftype does not change even the function it refers to becomes defined or redefined.
For C and LLVM users: Thefuncreftype is similar to the "pointer to function" type in C and LLVM, but it only refer to Mu functions. It is not a pointer (see theufuncptrtype). It may be implemented under the hood as a pointer to a function, which will be an implementation detail.
Example:
.typedef @i64 = int<64> .funcsig @sig1 = (@i64 @i64) -> (@i64) .funcsig @sig2 = () -> () .typedef @func1 = funcref<@sig1> .typedef @func2 = funcref<@sig2>
threadref
stackref
framecursorref
irnoderef
These types are opaque references to things within Mu. They are not
interchangeable with reference types. Only some special instructions (e.g.
@uvm.new_stack, NEWTHREAD, @uvm.meta.new_cursor) or API calls can
operate on them.
All opaque reference values can be NULL, which does not refer to anything.
threadref and stackref refer to Mu Threads and Mu stacks, respectively.
They are used to manipulate the threads and stacks. In
particular, stackref is used in the SWAPSTACK instruction. stackref
is not a pointer to the top of the stack. It refers to the same stack even if
frames are added or removed.
framecursorref refers to to frame cursors. A frame cursor is an internal
structure used by the stack introspection API to iterate through stack frames.
Its content is mutable but opaque. See Threads and Stacks for more details.
irnoderef refers to a Mu IR node being constructed by the IR Builder API.