Context
The interpreter loop got about ~10% performance regression between emulator 0.19 and 0.20. This happened after the refactoring we made in the interpreter to make the TLB code more sane and encapsulated access to address offsets through host_addr type.
Possible solutions
Investigate what assembly instructions were added by tracing the host amd64/arm64 instructions when running the interpreter between emulator releases. Then come up with a code refactor to save host instructions in the interpreter hot loop while retaining code clarity and sanity (no crazy hacks for performance).
Context
The interpreter loop got about ~10% performance regression between emulator 0.19 and 0.20. This happened after the refactoring we made in the interpreter to make the TLB code more sane and encapsulated access to address offsets through
host_addrtype.Possible solutions
Investigate what assembly instructions were added by tracing the host amd64/arm64 instructions when running the interpreter between emulator releases. Then come up with a code refactor to save host instructions in the interpreter hot loop while retaining code clarity and sanity (no crazy hacks for performance).