`memory_manager::alloc` should return error on OOM

TODO:

-   Make `memory_manager::alloc` and `memory_manager::alloc_with_options` return `Result<Address, AllocError>`.
    -   Make them return error when OOM happens, even when at a safe point.
-   Write in the documentation that `Collection::out_of_memory` will return.
-   Refactor the allocation code (specifically `Allocator::alloc_once_inline`) for the changed APIs mentioned above.

# The JikesRVM legacy

Our memory allocation architecture is ported from JikesRVM.  In JikesRVM, `MemoryManager.allocateSpace` never returns `null`, even in the case of OOM.  When OOM happens, MMTk calls `Collection.outOfMemory()`.  The VM implements this method by throwing an exception.

```java
public class Collection extends org.mmtk.vm.Collection {
  @Override
  @UninterruptibleNoWarn
  public void outOfMemory() {
    throw RVMThread.getOutOfMemoryError();
  }
}
```

This works because JikesRVM's (JIT and AoT) compilers compile both VM and MMTk code using the same ABI.  The uniform stack layout and the uniform stack unwinding mechanism allow an exception to be thrown from `Collection.outOfMemory()` into MMTk frames, and then into VM frames, and then all the way to application frames, where the exception can be caught and handled.

The JikesRVM has always assumed that `Collection.outOfMemory()` never returns.

```java
        if (failWithOOM) {
          VM.collection.outOfMemory();
          VM.assertions.fail("Not Reached"); // THIS IS UNREACHABLE!
          return Address.zero();
        }
```

There is no way to express in Java that a function never returns (like Rust's `fn () -> !`), so JikesRVM uses assertions.  The `return` statement is there to make compilation successful.

# It doesn't work in the Rust MMTk

In Rust MMTk, the VM binding cannot unwind the stack in `Collection::out_of_memory`, at least not in a portable way.  When using the Rust MMTk, the VM, the MMTk core and the application code may have different ABIs.  Take the OpenJDK binding for example.

-   The VM and the C++ part of the VM binding are implemented in C++ and are AoT compiled.
-   MMTk and the Rust part of the binding are implemented in Rust and are AoT compiled.
-   The application code is provided as bytecode, and is either interpreted or JIT compiled.

If `Collection::out_of_memory` were to throw an exception directly, it will throw from C++ code (VM and VM binding) to Rust (VM binding and MMTk core) to JIT-compiled machine code.  This crosses three languages.  Rust doesn't have the concept of exceptions (while `panic!()` is implemented with some form of stack unwinding), and C++ exceptions and Java exceptions are implemented in different ways.

**The only safe way to transfer control back to the application is returning frame by frame out of `memory_manager::alloc`**.

# Proposed API changes

## Allocation

First of all, `memory_manager::alloc` and `memory_manager::alloc_with_options` shall be able to return error values.

```rust

pub fn memory_manager::alloc(...) -> Result<Address, AllocationFailure> { ... }
pub fn memory_manager::alloc_with_options(...) -> Result<Address, AllocationFailure> { ... }

pub enum AllocationFailure {
    /// The memory has exhausted.  The VM binding should raise out-of-memory error to the application.
    OutOfMemory,
    /// The allocation is not at a safepiont, but the allocation could not be satisfied without a GC.
    WouldBlock,
}
```

Currently, these are the two possible errors the VM binding could get.

The caller of `alloc` should match against the `Result<Address, AllocationFailure>` and handle errors accordingly.  Specifically, if it is `Err(AllocationFailure::OutOfMemory)`, it should raise OOM exception.

The application code can be either interpreted or compiled.  Handling errors in the interpreter is straightforward.

JIT-compiled code needs some tricks.  First of all, the JIT-compiled code should use bump-pointer fast paths when possible.  For the slow path, the VM binding is advised to wrap the raw `memory_manager::alloc(...) -> Result<..., ...>` into a function `void* mmtk_alloc(...)`.  It shall follow the C calling convention so that it is easy to emit code to call from JIT-compiled code to the runtime.  When successful, it will simply return the pointer.  When failed, there are two strategies.

1.  Returning 0 to the JIT-compiled code, and generate a check instruction after each allocation slow path and branch to a code stub that throws OutOfMemoryError.
2.  Modifying the return address before returning from `void* mmtk_alloc(...)` and use a return barrier to raise the exception.

Using return barrier can eliminate a check on the code path where the allocation is successful.  But it is probably not that important because it is the slow path.

### About the existing `AllocationError`

We currently have the `AllocationError` type which is currently used by  `Collection::out_of_memory`

```rust
pub enum AllocationError {
    /// The specified heap size is too small for the given program to continue.
    HeapOutOfMemory,
    /// The OS is unable to mmap or acquire more memory. Critical error. MMTk expects the VM to
    /// abort if such an error is thrown.
    MmapOutOfMemory,
}
```

`AllocationError::HeapOutOfMemory` is equivalent to the `AllocationFailure::OutOfMemory` I proposed.

`AllocationError::MmapOutOfMemory`, as the doc says, is a critical error and should result in immediate VM termination.

There is no equivalent to `AllocationFailure::WouldBlock`.

We probably should let `AllocationError` and `AllocationFailure` to coexist because they are used by two different API functions and have different sets of values.

## `Collection::out_of_memory`

We need to explicitly document that this function is expected to return.

Even though the VM binding cannot unwind the stack from within `Collection::out_of_memory` in a portable way, it still allows the VM binding to set thread-local states so that after returning from `memory_manager::alloc`, it can check the state and handle OOM errors.  I (Kunshan) am not sure how useful this is, given that `alloc` is able to return `Err(AllocationFailure::OutOfMemory)`, but we'd better keep it for a while just in case any VM actually needs that.  For example, it can still panic fast when `AllocationError::MmapOutOfMemory` happens.

I am not sure if we should allow the VM binding to override `Collection::out_of_memory` and translate `AllocationError::MmapOutOfMemory` into a `AllocationFailure::OutOfMemory` to be returned from `memory_manager::alloc`.  When `mmap` cannot allocate more memory, it doesn't mean the VM cannot continue.  If the VM has pre-allocated `OutOfMemoryError` object instances, it can still unwind the stack (without stack trace or with limited stack trace) and let the application "limp" for a while and shut down gracefully.

# Proposed refactoring

We need an `InternalAllocationFailure` type.

```rust
pub(crate) enum InternalAllocationFailure {
    BadRequest,
    Retry,
    WouldBlock,    
}
```

`Space::acquire` and its sub-functions `Space::get_new_pages_and_initialize` and `Space::not_acquiring` need to distinguish between two cases:

1.  If it is at safepoint and it blocked for GC, it shall return `Err(Retry)`.
2.  If it is not at safepoint but GC is needed, it shall return `Err(NeedGC)`.

All functions in the call chain to `Space::acquire`, such as `BumpAllocator::acquire_block` and `ImmixSpace::get_clean_block`, should forward that error to their callers such as `ImmixAllocator::acquire_clean_block` all the way up to `Allocator::alloc_slow_inline`.  During this path, some functions may check for obvious allocation errors (`Space::handle_obvious_oom_request`).  If that fails, it should return `Err(BadRequest)`.

`Allocator::alloc_slow_inline` should match against the errors.

-   When `InternalAllocationFailure::BadRequest`, it should immediately fail with OOM. (This fixes the problem https://github.com/mmtk/mmtk-core/pull/1473 is trying to solve.)
-   When `InternalAllocationFailure::WouldBlock`, it should immediately return `AllocationFailure::WouldBlock`.  (We don't need to check if we are at safepoint now because `Space::acquire` checked it for us.)
-   When `InternalAllocationFailure::Retry`, it should loop and try allocating again.  But if after an emergency collection it still returns `Retry`, it shall fail with OOM.

When OOM, it shall call `Collection::out_of_memory` (if `AllocationOptions::allow_oom_call` is true) and then return `AllocationFailure::OutOfMemory`.

# Performance concerns

The VM should use bump pointer fast paths whenever possible, and avoid calling `memory_manager::alloc` for most of the allocations.  This means all of the API changes and refactoring happen on the slow paths.  We shouldn't see obvious performance change with plans that support bump-pointer allocation, which should be everything except `MarkSweep` and `PageProtect`.

# Related issues

https://github.com/mmtk/mmtk-core/issues/1223 proposed introducing `NonZeroAddress` because successful allocations should never return `Address::ZERO`.  It mentioned returning error states using `None`, while this PR proposes using `Err(...)`.

But regardless whether we introduce `NonZeroAddress`, once we started using `Result<Address, AllocationFailure>` or `Result<Address, InternalAllocationFailure>`, we should stop checking against `Address::ZERO` and start using `Err(...)` to report errors.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`memory_manager::alloc` should return error on OOM #1475

The JikesRVM legacy

It doesn't work in the Rust MMTk

Proposed API changes

Allocation

About the existing `AllocationError`

`Collection::out_of_memory`

Proposed refactoring

Performance concerns

Related issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

memory_manager::alloc should return error on OOM #1475

Description

The JikesRVM legacy

It doesn't work in the Rust MMTk

Proposed API changes

Allocation

About the existing AllocationError

Collection::out_of_memory

Proposed refactoring

Performance concerns

Related issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`memory_manager::alloc` should return error on OOM #1475

About the existing `AllocationError`

`Collection::out_of_memory`