Description
hvisor's own physical memory range [skernel, __hv_end) repeatedly falls inside the MEM_TYPE_RAM regions defined in ROOT_ZONE_MEMORY_REGIONS across various board configs. This causes the root zone's Linux kernel page allocator to treat hvisor's memory pages as free RAM, allocate them to kernel or user code, and write to them — silently corrupting hvisor's page tables and heap.
The corruption eventually manifests as seemingly unrelated crashes: "unhandled MMIO fault" panics, stage-2 translation faults, or hangs. The root cause (memory stomping) is very hard to diagnose from the symptoms.
History
This problem has occurred at least 10 times across different boards and refactoring cycles, often after:
- Adding PCIe support (which significantly increased binary size)
- Adjusting
HV_MEM_POOL_SIZE
- Porting to new boards where
BASE_ADDRESS wasn't carefully checked against the RAM layout
Each time it was "fixed" by manually moving BASE_ADDRESS in the linker script, but the same root cause reappears because nothing enforces the invariant.
Current Mitigations (PR #XXX)
-
Compile-time overlap check (tools/check_hv_mem_overlap.py): Post-link script that reads skernel/__hv_end from the ELF, parses ROOT_ZONE_MEMORY_REGIONS from board.rs, and fails the build if any MEM_TYPE_RAM region overlaps hvisor's range.
-
Runtime diagnostic (check_fault_in_hvisor_mem() in trap.rs): When handle_dabt gets an MMIO fault whose address falls within hvisor's memory range, it prints a specific diagnostic ("FAULT ADDRESS is within hvisor's physical memory range") before panicking, instead of the generic "mmio_handle_access" error.
What a Real Fix Would Look Like
These mitigations only detect the problem. A real fix needs an architectural change:
- Dynamically reserve hvisor's physical pages from the root zone's memory map at boot
- Or, always place hvisor in a dedicated physical address range outside of any board's RAM layout
- Or, punch a hole in the root zone RAM regions for hvisor's range automatically during zone creation
Related
Description
hvisor's own physical memory range
[skernel, __hv_end)repeatedly falls inside theMEM_TYPE_RAMregions defined inROOT_ZONE_MEMORY_REGIONSacross various board configs. This causes the root zone's Linux kernel page allocator to treat hvisor's memory pages as free RAM, allocate them to kernel or user code, and write to them — silently corrupting hvisor's page tables and heap.The corruption eventually manifests as seemingly unrelated crashes: "unhandled MMIO fault" panics, stage-2 translation faults, or hangs. The root cause (memory stomping) is very hard to diagnose from the symptoms.
History
This problem has occurred at least 10 times across different boards and refactoring cycles, often after:
HV_MEM_POOL_SIZEBASE_ADDRESSwasn't carefully checked against the RAM layoutEach time it was "fixed" by manually moving
BASE_ADDRESSin the linker script, but the same root cause reappears because nothing enforces the invariant.Current Mitigations (PR #XXX)
Compile-time overlap check (
tools/check_hv_mem_overlap.py): Post-link script that readsskernel/__hv_endfrom the ELF, parsesROOT_ZONE_MEMORY_REGIONSfromboard.rs, and fails the build if anyMEM_TYPE_RAMregion overlaps hvisor's range.Runtime diagnostic (
check_fault_in_hvisor_mem()intrap.rs): Whenhandle_dabtgets an MMIO fault whose address falls within hvisor's memory range, it prints a specific diagnostic ("FAULT ADDRESS is within hvisor's physical memory range") before panicking, instead of the generic "mmio_handle_access" error.What a Real Fix Would Look Like
These mitigations only detect the problem. A real fix needs an architectural change:
Related