You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently running the Debian system on the sysoul-x3300 platform (based on rk3588). During memory stress testing using memtester, I observed a critical stability issue.
When the tested memory size exceeds 4 GiB (total memory is 16 GiB, available 15 GiB; testing 1 GiB or 2 GiB works fine), an MMIO fault in zone0 is frequently triggered. This occurs even though the accessed memory address is correctly configured as belonging to zone0 in board.rs.
Logs
root@linaro-alip:/root# free -h
total used free shared buff/cache available
Mem: 15Gi 342Mi 14Gi 21Mi 342Mi 15Gi
Swap: 0B 0B 0B
root@linaro-alip:/root# memtester 12G 1
memtester version 4.5.0 (64-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 12288MB (12884901888 bytes)
got 12288MB (12884901888 bytes), trying mlock ...[WARN 1] (hvisor::memory::mmio:109) Zone 0 unhandled mmio fault MMIOAccess {
address: 0xb600000,
size: 0x1,
is_write: true,
value: 0xffffff800b600000,
}
[ERROR 1] (hvisor::arch::aarch64::trap:251) mmio_handle_access: [src/memory/mmio.rs:110:13] Invalid argument
[ERROR 1] (hvisor::panic:24) panic occurred: PanicInfo {
payload: Any { .. },
message: Some(
root zone has some error,
),
location: Location {
file: "src/zone.rs",
line: 303,
col: 9,
},
can_unwind: true,
force_no_backtrace: false,
}
[WARN 0] (hvisor::memory::mmio:109) Zone 0 unhandled mmio fault MMIOAccess {
address: 0x3ec045a08,
size: 0x1,
is_write: false,
value: 0x0,
}
[ERROR 0] (hvisor::arch::aarch64::trap:251) mmio_handle_access: [src/memory/mmio.rs:110:13] Invalid argument
[ERROR 0] (hvisor::panic:24) panic occurred: PanicInfo {
payload: Any { .. },
message: Some(
root zone has some error,
),
location: Location {
file: "src/zone.rs",
line: 303,
col: 9,
},
can_unwind: true,
force_no_backtrace: false,
}
Configuration (board.rs)
/// The physical memory layout of the board./// Each address should align to 2M (0x20_0000)./// Addresses must be in ascending order.#[rustfmt::skip]pubconstBOARD_PHYSMEM_LIST:&[(u64,u64,MemoryType)] = &[// ( start, end, type)(0x0000_0000,0x0020_0000,MemoryType::Device),// Includes low-address SRAM, marked as Device(0x0020_0000,0x0840_0000,MemoryType::Normal),(0x0940_0000,0xf000_0000,MemoryType::Normal),(0xf000_0000,0x1_0000_0000,MemoryType::Device),// Dense device region, marked as Device.(0x1_0000_0000,0x3_fc00_0000,MemoryType::Normal),// (0x3_fc50_0000, 0x3_fff0_0000, MemoryType::Normal),(0x3_fc40_0000,0x4_0000_0000,MemoryType::Normal),// aligned to 2 MiB(0x4_f000_0000,0x5_0000_0000,MemoryType::Normal),];
pubconstROOT_ZONE_MEMORY_REGIONS:&[HvConfigMemoryRegion] = &[// /proc/iomem System RAMHvConfigMemoryRegion{mem_type:MEM_TYPE_RAM,physical_start:0x0020_0000,virtual_start:0x0020_0000,size:0x0820_0000,},HvConfigMemoryRegion{mem_type:MEM_TYPE_RAM,physical_start:0x0940_0000,virtual_start:0x0940_0000,size:0xe6c0_0000,},HvConfigMemoryRegion{mem_type:MEM_TYPE_RAM,physical_start:0x1_0000_0000,virtual_start:0x1_0000_0000,size:0x2_fc00_0000,},HvConfigMemoryRegion{mem_type:MEM_TYPE_RAM,physical_start:0x3_fc50_0000,virtual_start:0x3_fc50_0000,size:0x03a0_0000,},HvConfigMemoryRegion{mem_type:MEM_TYPE_RAM,physical_start:0x4_f000_0000,virtual_start:0x4_f000_0000,size:0x1000_0000,},// RamoopsHvConfigMemoryRegion{mem_type:MEM_TYPE_RAM,physical_start:0x0011_0000,virtual_start:0x0011_0000,size:0x000f_0000,},// /proc/iomem Devices I/OHvConfigMemoryRegion{mem_type:MEM_TYPE_IO,physical_start:0xfb00_0000,virtual_start:0xfb00_0000,size:0x0020_0000,},HvConfigMemoryRegion{mem_type:MEM_TYPE_IO,physical_start:0xfc00_0000,virtual_start:0xfc00_0000,size:0x0200_0000,},HvConfigMemoryRegion{mem_type:MEM_TYPE_IO,physical_start:0xfe00_0000,virtual_start:0xfe00_0000,size:0x0060_0000,},HvConfigMemoryRegion{mem_type:MEM_TYPE_IO,physical_start:0xfea0_0000,virtual_start:0xfea0_0000,size:0x0050_0000,},// SRAM and Other DevicesHvConfigMemoryRegion{mem_type:MEM_TYPE_IO,physical_start:0x0010_f000,virtual_start:0x0010_f000,// size: 0x0100, // 10f000.sramsize:0x1000,// aligned with page size},HvConfigMemoryRegion{mem_type:MEM_TYPE_IO,physical_start:0xff00_1000,virtual_start:0xff00_1000,size:0x000e_e000,//ff001000.sram},// Unknown Region, maybe we should ask vendor for helpHvConfigMemoryRegion{mem_type:MEM_TYPE_IO,physical_start:0x0010_0000,virtual_start:0x0010_0000,size:0xf000,},];
Root Cause Analysis
Upon investigation, the root cause is an insufficient reserved memory area for the hypervisor in the device tree, leading to memory corruption by the root-linux kernel.
According to src/consts.rs, the memory layout of hvisor consists of:
Static binary code (.text, .data, etc.)
Per-CPU local storage (Stack, etc.)
Frame Allocator Memory Pool
Source Code Reference (src/consts.rs):
pubusecrate::memory::PAGE_SIZE;usecrate::{memory::addr::VirtAddr, platform::BOARD_NCPUS};/// Size of the hypervisor heap.pubconstHV_HEAP_SIZE:usize = 1024*1024;// 1 MiBpubconstHV_MEM_POOL_SIZE:usize = 64*1024*1024;// 64 MiB/// Size of the per-CPU data (stack and other CPU-local data).pubconstPER_CPU_SIZE:usize = 512*1024;// 512 KiB/// ... (omitted)pubfnmem_pool_start() -> VirtAddr{core_end() + MAX_CPU_NUM*PER_CPU_SIZE}pubfnhv_end() -> VirtAddr{mem_pool_start() + HV_MEM_POOL_SIZE}
The Discrepancy:
The actual required memory range extends up to 0x04ae_6000 (approx. 70 MiB total). However, most existing device tree configurations only reserve 4 MiB for hvisor.
The reserved 4 MiB covers the static binary and potentially the per-CPU data for the first few cores, but completely fails to cover the 64 MiB Frame Allocator.
hvisor uses this Frame Allocator to manage memory regions via a BTree structure.
When running memtester with large memory blocks, the root-linux kernel allocates pages that physically overlap with hvisor's unreserved Frame Allocator region.
Linux overwrites the Frame Allocator data, corrupting the BTree metadata used for zone memory region tracking.
Consequently, hvisor loses track of valid memory regions, resulting in false MMIO faults when those addresses are accessed.
Why it seemed to work before:
Luck: The specific physical pages used by the Frame Allocator were not allocated/overwritten by Linux during lighter loads.
Partial Coverage: The 4 MiB reservation covers the binary and initial CPU stacks. Since root-linux often utilizes fewer cores (e.g., 2 cores) during boot or idle, the per-CPU data for the active cores remained safe within the reserved area.
Action Items
To resolve this issue and prevent future occurrences, the following actions are required:
Configuration Fix: Update all existing board configurations and Device Trees (DTS) to reserve sufficient memory (covering the full 64 MiB pool + per-CPU areas).
CI/CD Enhancement: Integrate memtester into the CI system test workflow. The root-linux should perform memory stress tests immediately after boot to ensure memory integrity before proceeding with other tests. This explains the high failure rate in past CI runs.
Documentation: Update the hvisor-book to explicitly document the static and runtime memory layout. Add a guide on how to correctly calculate and configure reserved-memory in the device tree.
Bug Report
I am currently running the Debian system on the sysoul-x3300 platform (based on rk3588). During memory stress testing using
memtester, I observed a critical stability issue.When the tested memory size exceeds 4 GiB (total memory is 16 GiB, available 15 GiB; testing 1 GiB or 2 GiB works fine), an MMIO fault in zone0 is frequently triggered. This occurs even though the accessed memory address is correctly configured as belonging to
zone0inboard.rs.Logs
Configuration (
board.rs)Root Cause Analysis
Upon investigation, the root cause is an insufficient reserved memory area for the hypervisor in the device tree, leading to memory corruption by the root-linux kernel.
According to
src/consts.rs, the memory layout ofhvisorconsists of:.text,.data, etc.)Source Code Reference (
src/consts.rs):Memory Layout Calculation (sysoul-x3300, 8 CPUs):
0x0050_0000core_end(Binary end):0x006e_6000mem_pool_start:0x00ae_6000core_end+ (512 KiB * 8 CPUs) ≈0x006e_6000+ 4 MiBhv_end:0x04ae_6000mem_pool_start+ 64 MiB (Frame Allocator)The Discrepancy:
The actual required memory range extends up to
0x04ae_6000(approx. 70 MiB total). However, most existing device tree configurations only reserve 4 MiB forhvisor.%%{init: {'theme': 'base', 'themeVariables': { 'fontFamily': 'arial', 'fontSize': '14px'}}}%% flowchart LR classDef memBlock fill:#e3f2fd,stroke:#1565c0,stroke-width:1px; classDef boundaryNode fill:none,stroke:none,color:#555,font-size:12px; classDef dangerBlock fill:#ffcdd2,stroke:#b71c1c,stroke-width:2px; subgraph Reserved ["✅ Reserved Memory (Safe: 4 MiB)<br/>Range: 0x0050_0000 ~ 0x0090_0000"] direction LR StartAddr["0x0050_0000"]:::boundaryNode Bin["Static Bin<br/>(~1.9 MiB)<br/>End: 0x006E_6000"]:::memBlock C0["CPU 0<br/>512 KiB"]:::memBlock C1["CPU 1<br/>512 KiB"]:::memBlock C2["CPU 2<br/>512 KiB"]:::memBlock C3["CPU 3<br/>512 KiB<br/>End: 0x008E_6000"]:::memBlock StartAddr --- Bin --- C0 --- C1 --- C2 --- C3 end subgraph Unreserved ["❌ Unreserved Region (Unsafe / MMIO Fault Risk)<br/>Range: 0x0090_0000 ~ 0x04AE_6000"] direction LR C4["CPU 4<br/>(Cross Boundary)<br/>Start: 0x008E_6000"]:::dangerBlock C5["CPU 5<br/>512 KiB"]:::dangerBlock C6["CPU 6<br/>512 KiB"]:::dangerBlock C7["CPU 7<br/>512 KiB"]:::dangerBlock PoolStartAddr["0x00AE_6000"]:::boundaryNode FrameAlloc["Frame Allocator Pool<br/>Size: 64 MiB<br/>(Target of Corruption)"]:::dangerBlock EndAddr["0x04AE_6000"]:::boundaryNode C4 --- C5 --- C6 --- C7 --- PoolStartAddr --- FrameAlloc --- EndAddr end C3 --- C4 style Reserved fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,stroke-dasharray: 5 5 style Unreserved fill:#ffebee,stroke:#c62828,stroke-width:2px,stroke-dasharray: 5 5Failure Mechanism
hvisoruses this Frame Allocator to manage memory regions via a BTree structure.memtesterwith large memory blocks, the root-linux kernel allocates pages that physically overlap withhvisor's unreserved Frame Allocator region.hvisorloses track of valid memory regions, resulting in false MMIO faults when those addresses are accessed.Why it seemed to work before:
Action Items
To resolve this issue and prevent future occurrences, the following actions are required:
memtesterinto the CI system test workflow. The root-linux should perform memory stress tests immediately after boot to ensure memory integrity before proceeding with other tests. This explains the high failure rate in past CI runs.hvisor-bookto explicitly document the static and runtime memory layout. Add a guide on how to correctly calculate and configurereserved-memoryin the device tree.