You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make memory_manager::alloc and memory_manager::alloc_with_options return Result<Address, AllocError>.
Make them return error when OOM happens, even when at a safe point.
Write in the documentation that Collection::out_of_memory will return.
Refactor the allocation code (specifically Allocator::alloc_once_inline) for the changed APIs mentioned above.
The JikesRVM legacy
Our memory allocation architecture is ported from JikesRVM. In JikesRVM, MemoryManager.allocateSpace never returns null, even in the case of OOM. When OOM happens, MMTk calls Collection.outOfMemory(). The VM implements this method by throwing an exception.
This works because JikesRVM's (JIT and AoT) compilers compile both VM and MMTk code using the same ABI. The uniform stack layout and the uniform stack unwinding mechanism allow an exception to be thrown from Collection.outOfMemory() into MMTk frames, and then into VM frames, and then all the way to application frames, where the exception can be caught and handled.
The JikesRVM has always assumed that Collection.outOfMemory() never returns.
if (failWithOOM) {
VM.collection.outOfMemory();
VM.assertions.fail("Not Reached"); // THIS IS UNREACHABLE!returnAddress.zero();
}
There is no way to express in Java that a function never returns (like Rust's fn () -> !), so JikesRVM uses assertions. The return statement is there to make compilation successful.
It doesn't work in the Rust MMTk
In Rust MMTk, the VM binding cannot unwind the stack in Collection::out_of_memory, at least not in a portable way. When using the Rust MMTk, the VM, the MMTk core and the application code may have different ABIs. Take the OpenJDK binding for example.
The VM and the C++ part of the VM binding are implemented in C++ and are AoT compiled.
MMTk and the Rust part of the binding are implemented in Rust and are AoT compiled.
The application code is provided as bytecode, and is either interpreted or JIT compiled.
If Collection::out_of_memory were to throw an exception directly, it will throw from C++ code (VM and VM binding) to Rust (VM binding and MMTk core) to JIT-compiled machine code. This crosses three languages. Rust doesn't have the concept of exceptions (while panic!() is implemented with some form of stack unwinding), and C++ exceptions and Java exceptions are implemented in different ways.
The only safe way to transfer control back to the application is returning frame by frame out of memory_manager::alloc.
Proposed API changes
Allocation
First of all, memory_manager::alloc and memory_manager::alloc_with_options shall be able to return error values.
pubfnmemory_manager::alloc(...) -> Result<Address,AllocationFailure>{ ...}pubfn memory_manager::alloc_with_options(...) -> Result<Address,AllocationFailure>{ ...}pubenumAllocationFailure{/// The memory has exhausted. The VM binding should raise out-of-memory error to the application.OutOfMemory,/// The allocation is not at a safepiont, but the allocation could not be satisfied without a GC.WouldBlock,}
Currently, these are the two possible errors the VM binding could get.
The caller of alloc should match against the Result<Address, AllocationFailure> and handle errors accordingly. Specifically, if it is Err(AllocationFailure::OutOfMemory), it should raise OOM exception.
The application code can be either interpreted or compiled. Handling errors in the interpreter is straightforward.
JIT-compiled code needs some tricks. First of all, the JIT-compiled code should use bump-pointer fast paths when possible. For the slow path, the VM binding is advised to wrap the raw memory_manager::alloc(...) -> Result<..., ...> into a function void* mmtk_alloc(...). It shall follow the C calling convention so that it is easy to emit code to call from JIT-compiled code to the runtime. When successful, it will simply return the pointer. When failed, there are two strategies.
Returning 0 to the JIT-compiled code, and generate a check instruction after each allocation slow path and branch to a code stub that throws OutOfMemoryError.
Modifying the return address before returning from void* mmtk_alloc(...) and use a return barrier to raise the exception.
Using return barrier can eliminate a check on the code path where the allocation is successful. But it is probably not that important because it is the slow path.
About the existing AllocationError
We currently have the AllocationError type which is currently used by Collection::out_of_memory
pubenumAllocationError{/// The specified heap size is too small for the given program to continue.HeapOutOfMemory,/// The OS is unable to mmap or acquire more memory. Critical error. MMTk expects the VM to/// abort if such an error is thrown.MmapOutOfMemory,}
AllocationError::HeapOutOfMemory is equivalent to the AllocationFailure::OutOfMemory I proposed.
AllocationError::MmapOutOfMemory, as the doc says, is a critical error and should result in immediate VM termination.
There is no equivalent to AllocationFailure::WouldBlock.
We probably should let AllocationError and AllocationFailure to coexist because they are used by two different API functions and have different sets of values.
Collection::out_of_memory
We need to explicitly document that this function is expected to return.
Even though the VM binding cannot unwind the stack from within Collection::out_of_memory in a portable way, it still allows the VM binding to set thread-local states so that after returning from memory_manager::alloc, it can check the state and handle OOM errors. I (Kunshan) am not sure how useful this is, given that alloc is able to return Err(AllocationFailure::OutOfMemory), but we'd better keep it for a while just in case any VM actually needs that. For example, it can still panic fast when AllocationError::MmapOutOfMemory happens.
I am not sure if we should allow the VM binding to override Collection::out_of_memory and translate AllocationError::MmapOutOfMemory into a AllocationFailure::OutOfMemory to be returned from memory_manager::alloc. When mmap cannot allocate more memory, it doesn't mean the VM cannot continue. If the VM has pre-allocated OutOfMemoryError object instances, it can still unwind the stack (without stack trace or with limited stack trace) and let the application "limp" for a while and shut down gracefully.
Space::acquire and its sub-functions Space::get_new_pages_and_initialize and Space::not_acquiring need to distinguish between two cases:
If it is at safepoint and it blocked for GC, it shall return Err(Retry).
If it is not at safepoint but GC is needed, it shall return Err(NeedGC).
All functions in the call chain to Space::acquire, such as BumpAllocator::acquire_block and ImmixSpace::get_clean_block, should forward that error to their callers such as ImmixAllocator::acquire_clean_block all the way up to Allocator::alloc_slow_inline. During this path, some functions may check for obvious allocation errors (Space::handle_obvious_oom_request). If that fails, it should return Err(BadRequest).
Allocator::alloc_slow_inline should match against the errors.
When InternalAllocationFailure::WouldBlock, it should immediately return AllocationFailure::WouldBlock. (We don't need to check if we are at safepoint now because Space::acquire checked it for us.)
When InternalAllocationFailure::Retry, it should loop and try allocating again. But if after an emergency collection it still returns Retry, it shall fail with OOM.
When OOM, it shall call Collection::out_of_memory (if AllocationOptions::allow_oom_call is true) and then return AllocationFailure::OutOfMemory.
Performance concerns
The VM should use bump pointer fast paths whenever possible, and avoid calling memory_manager::alloc for most of the allocations. This means all of the API changes and refactoring happen on the slow paths. We shouldn't see obvious performance change with plans that support bump-pointer allocation, which should be everything except MarkSweep and PageProtect.
Related issues
#1223 proposed introducing NonZeroAddress because successful allocations should never return Address::ZERO. It mentioned returning error states using None, while this PR proposes using Err(...).
But regardless whether we introduce NonZeroAddress, once we started using Result<Address, AllocationFailure> or Result<Address, InternalAllocationFailure>, we should stop checking against Address::ZERO and start using Err(...) to report errors.
TODO:
memory_manager::allocandmemory_manager::alloc_with_optionsreturnResult<Address, AllocError>.Collection::out_of_memorywill return.Allocator::alloc_once_inline) for the changed APIs mentioned above.The JikesRVM legacy
Our memory allocation architecture is ported from JikesRVM. In JikesRVM,
MemoryManager.allocateSpacenever returnsnull, even in the case of OOM. When OOM happens, MMTk callsCollection.outOfMemory(). The VM implements this method by throwing an exception.This works because JikesRVM's (JIT and AoT) compilers compile both VM and MMTk code using the same ABI. The uniform stack layout and the uniform stack unwinding mechanism allow an exception to be thrown from
Collection.outOfMemory()into MMTk frames, and then into VM frames, and then all the way to application frames, where the exception can be caught and handled.The JikesRVM has always assumed that
Collection.outOfMemory()never returns.There is no way to express in Java that a function never returns (like Rust's
fn () -> !), so JikesRVM uses assertions. Thereturnstatement is there to make compilation successful.It doesn't work in the Rust MMTk
In Rust MMTk, the VM binding cannot unwind the stack in
Collection::out_of_memory, at least not in a portable way. When using the Rust MMTk, the VM, the MMTk core and the application code may have different ABIs. Take the OpenJDK binding for example.If
Collection::out_of_memorywere to throw an exception directly, it will throw from C++ code (VM and VM binding) to Rust (VM binding and MMTk core) to JIT-compiled machine code. This crosses three languages. Rust doesn't have the concept of exceptions (whilepanic!()is implemented with some form of stack unwinding), and C++ exceptions and Java exceptions are implemented in different ways.The only safe way to transfer control back to the application is returning frame by frame out of
memory_manager::alloc.Proposed API changes
Allocation
First of all,
memory_manager::allocandmemory_manager::alloc_with_optionsshall be able to return error values.Currently, these are the two possible errors the VM binding could get.
The caller of
allocshould match against theResult<Address, AllocationFailure>and handle errors accordingly. Specifically, if it isErr(AllocationFailure::OutOfMemory), it should raise OOM exception.The application code can be either interpreted or compiled. Handling errors in the interpreter is straightforward.
JIT-compiled code needs some tricks. First of all, the JIT-compiled code should use bump-pointer fast paths when possible. For the slow path, the VM binding is advised to wrap the raw
memory_manager::alloc(...) -> Result<..., ...>into a functionvoid* mmtk_alloc(...). It shall follow the C calling convention so that it is easy to emit code to call from JIT-compiled code to the runtime. When successful, it will simply return the pointer. When failed, there are two strategies.void* mmtk_alloc(...)and use a return barrier to raise the exception.Using return barrier can eliminate a check on the code path where the allocation is successful. But it is probably not that important because it is the slow path.
About the existing
AllocationErrorWe currently have the
AllocationErrortype which is currently used byCollection::out_of_memoryAllocationError::HeapOutOfMemoryis equivalent to theAllocationFailure::OutOfMemoryI proposed.AllocationError::MmapOutOfMemory, as the doc says, is a critical error and should result in immediate VM termination.There is no equivalent to
AllocationFailure::WouldBlock.We probably should let
AllocationErrorandAllocationFailureto coexist because they are used by two different API functions and have different sets of values.Collection::out_of_memoryWe need to explicitly document that this function is expected to return.
Even though the VM binding cannot unwind the stack from within
Collection::out_of_memoryin a portable way, it still allows the VM binding to set thread-local states so that after returning frommemory_manager::alloc, it can check the state and handle OOM errors. I (Kunshan) am not sure how useful this is, given thatallocis able to returnErr(AllocationFailure::OutOfMemory), but we'd better keep it for a while just in case any VM actually needs that. For example, it can still panic fast whenAllocationError::MmapOutOfMemoryhappens.I am not sure if we should allow the VM binding to override
Collection::out_of_memoryand translateAllocationError::MmapOutOfMemoryinto aAllocationFailure::OutOfMemoryto be returned frommemory_manager::alloc. Whenmmapcannot allocate more memory, it doesn't mean the VM cannot continue. If the VM has pre-allocatedOutOfMemoryErrorobject instances, it can still unwind the stack (without stack trace or with limited stack trace) and let the application "limp" for a while and shut down gracefully.Proposed refactoring
We need an
InternalAllocationFailuretype.Space::acquireand its sub-functionsSpace::get_new_pages_and_initializeandSpace::not_acquiringneed to distinguish between two cases:Err(Retry).Err(NeedGC).All functions in the call chain to
Space::acquire, such asBumpAllocator::acquire_blockandImmixSpace::get_clean_block, should forward that error to their callers such asImmixAllocator::acquire_clean_blockall the way up toAllocator::alloc_slow_inline. During this path, some functions may check for obvious allocation errors (Space::handle_obvious_oom_request). If that fails, it should returnErr(BadRequest).Allocator::alloc_slow_inlineshould match against the errors.InternalAllocationFailure::BadRequest, it should immediately fail with OOM. (This fixes the problem Fix infinite loop if we return fromCollection::out_of_memory#1473 is trying to solve.)InternalAllocationFailure::WouldBlock, it should immediately returnAllocationFailure::WouldBlock. (We don't need to check if we are at safepoint now becauseSpace::acquirechecked it for us.)InternalAllocationFailure::Retry, it should loop and try allocating again. But if after an emergency collection it still returnsRetry, it shall fail with OOM.When OOM, it shall call
Collection::out_of_memory(ifAllocationOptions::allow_oom_callis true) and then returnAllocationFailure::OutOfMemory.Performance concerns
The VM should use bump pointer fast paths whenever possible, and avoid calling
memory_manager::allocfor most of the allocations. This means all of the API changes and refactoring happen on the slow paths. We shouldn't see obvious performance change with plans that support bump-pointer allocation, which should be everything exceptMarkSweepandPageProtect.Related issues
#1223 proposed introducing
NonZeroAddressbecause successful allocations should never returnAddress::ZERO. It mentioned returning error states usingNone, while this PR proposes usingErr(...).But regardless whether we introduce
NonZeroAddress, once we started usingResult<Address, AllocationFailure>orResult<Address, InternalAllocationFailure>, we should stop checking againstAddress::ZEROand start usingErr(...)to report errors.