I've done some thinking and coding since last time, which I want to present.
In the previous issue I presented some screenshots of the my proof of concept, with X and Y flip for BG tiles. That was a quick hack but now I've added proper support for VRAM and WRAM banking. This requires some reorganization of the structure of the code, where the memoryarray is now replaced by a two level structure where the memory map is divided into sixteen 4 kB sections, where each section points to a memory buffer. When a section can can't/shouldn't be accessed from RPi RAM (ie for ROM and SRAM) the entry in the first level table is set to NULL, which replaces the current ROM/SRAM range check. I believe this similar or better performance. This also means that 32+8 kB of RP2040 RAM isn't wasted as a placeholder for ROM+SRAM. On the other hand, there's now 32 kB banked WRAM + 16 kB banked VRAM + a 4 kB section that holds OAM and HRAM, so out of 64 kB, 52 kB is still in use, but at least it's all put to good use.
// Allocation of memory.
uint8_t volatile memVRAM0[0x2000]; // 8 kB of standard VRAM.
uint8_t volatile memVRAM1[0x2000]; // 8 kB of extended VRAM for GBC.
uint8_t volatile memWRAM[0x8000]; // 32 kB of work RAM.
uint8_t volatile memOAMandIO[0x1000]; // A 4 kB section of which only the upper 512 B is used for OAM, IO, HRAM.
// Definition of the memory areas, and initial configuration.
uint8_t volatile * memAreas[16] = {
/* 0-7: ROM (read only) */
NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
/* VRAM */
&memVRAM0[0x0000],&memVRAM0[0x1000],
/* SRAM */
NULL,NULL,
/* WRAM */
&memWRAM[0x0000],&memWRAM[0x1000],
/* echo WRAM + */
NULL,memOAMandIO,
};
// Sample use of the array.
void inline substitudeBusdataFromMemory() {
uint area = (*address >>12) & 0xF;
// The old check prevented substitution if the address was in ROM 0x0000-0x7fff or external RAM 0xa000-0xbfff
// This check does approximately the same thing because the those areas are unmapped by being assigned NULL.
if (memAreas[area]){
//This is from RAM, load our version as we cannot see the data on the bus
*opcode = memAreas[area][*address & 0xFFF];
history[*historyIndex] = rawBusData;
}
}
I will submit the progress so far when I've polished it a bit. Now for looking into the future, I'm seeking opinions from Seb, or anyone else with relevant experience about the following points.
General purpose and HBlank DMA
Something that's currently blocking better partial GBC support is support for the GBC only general purpose and HBlank DMA. This DMA runs at the full double speed of the CPU, while the external clock pin on the cart bus still outputs a 1 MiHz signal. The GBI might be able to switch to a different PIO program, and trigger on both negative and positive edges for the duration of the DMA. The challenge would be to time this switchover. Another wrinkle is that HBlank DMA can start happening "randomly" which is to say without a proper trigger condition from the CPU. This is unlike the general purpose DMA which will start immediately after the register write, so the Interceptor would know for sure exactly when it starts. Even just supporting the general purpose variant of DMA might help a lot with compatibility.
For HBlank DMA from WRAM (as opposed to ROM), it might be enough to detect it, idle the emulation for however long the DMA lasts, and copy data from the internal copy of WRAM to VRAM. For transfers from ROM, you'd really need to capture it from the bus. But the tricky part is detecting when a HBlank DMA transfer actually starts, compared to all other bus activity
Higher color depth grayscale
One issue is how to render the image in the best possible way, assuming everything else works. I've thought about it and come to the conclusion that the best way is probably to use the color palettes, but render it as grayscale. This would give a somewhat true image, just lacking color. However, this would ideally needs some more color depth. 3-5 bits would be decent, but I have to admit I don't understand the bit encoding of the NV12 and MJPEG data well enough to know if this is possible.
For the MJPEG I know that the code is producing a questionably legal JPEG file where only the DC component of each 8x8 block is encoded. Looking at the throughput, the isochronous transfer gives a theoretical maximum of 1023000 bytes per second. 1023000/60/160/144*8=5.92 bits per pixel at the limit. 4 or 5 bits would give 16 or 32 grayscales, which would be very good. But I don't know if there are encoding issues that would prevent that from being viable. If the data is Huffmann coded, it might also be possible to use a non-power of two number of symbols, for example 10 different symbols to represent various intensity levels.
I don't really know any details about the NV12 encoding, but for 24 FPS, you'd get 1023000/160/144/24=1.85 bytes/pixel. Just 1 byte of intensity data would be a very good resolution of course, but again, I'm unsure if there are limitations in the encoding that would prevent this from working.
I've done some thinking and coding since last time, which I want to present.
In the previous issue I presented some screenshots of the my proof of concept, with X and Y flip for BG tiles. That was a quick hack but now I've added proper support for VRAM and WRAM banking. This requires some reorganization of the structure of the code, where the
memoryarray is now replaced by a two level structure where the memory map is divided into sixteen 4 kB sections, where each section points to a memory buffer. When a section can can't/shouldn't be accessed from RPi RAM (ie for ROM and SRAM) the entry in the first level table is set to NULL, which replaces the current ROM/SRAM range check. I believe this similar or better performance. This also means that 32+8 kB of RP2040 RAM isn't wasted as a placeholder for ROM+SRAM. On the other hand, there's now 32 kB banked WRAM + 16 kB banked VRAM + a 4 kB section that holds OAM and HRAM, so out of 64 kB, 52 kB is still in use, but at least it's all put to good use.I will submit the progress so far when I've polished it a bit. Now for looking into the future, I'm seeking opinions from Seb, or anyone else with relevant experience about the following points.
General purpose and HBlank DMA
Something that's currently blocking better partial GBC support is support for the GBC only general purpose and HBlank DMA. This DMA runs at the full double speed of the CPU, while the external clock pin on the cart bus still outputs a 1 MiHz signal. The GBI might be able to switch to a different PIO program, and trigger on both negative and positive edges for the duration of the DMA. The challenge would be to time this switchover. Another wrinkle is that HBlank DMA can start happening "randomly" which is to say without a proper trigger condition from the CPU. This is unlike the general purpose DMA which will start immediately after the register write, so the Interceptor would know for sure exactly when it starts. Even just supporting the general purpose variant of DMA might help a lot with compatibility.
For HBlank DMA from WRAM (as opposed to ROM), it might be enough to detect it, idle the emulation for however long the DMA lasts, and copy data from the internal copy of WRAM to VRAM. For transfers from ROM, you'd really need to capture it from the bus. But the tricky part is detecting when a HBlank DMA transfer actually starts, compared to all other bus activity
Higher color depth grayscale
One issue is how to render the image in the best possible way, assuming everything else works. I've thought about it and come to the conclusion that the best way is probably to use the color palettes, but render it as grayscale. This would give a somewhat true image, just lacking color. However, this would ideally needs some more color depth. 3-5 bits would be decent, but I have to admit I don't understand the bit encoding of the NV12 and MJPEG data well enough to know if this is possible.
For the MJPEG I know that the code is producing a questionably legal JPEG file where only the DC component of each 8x8 block is encoded. Looking at the throughput, the isochronous transfer gives a theoretical maximum of 1023000 bytes per second. 1023000/60/160/144*8=5.92 bits per pixel at the limit. 4 or 5 bits would give 16 or 32 grayscales, which would be very good. But I don't know if there are encoding issues that would prevent that from being viable. If the data is Huffmann coded, it might also be possible to use a non-power of two number of symbols, for example 10 different symbols to represent various intensity levels.
I don't really know any details about the NV12 encoding, but for 24 FPS, you'd get 1023000/160/144/24=1.85 bytes/pixel. Just 1 byte of intensity data would be a very good resolution of course, but again, I'm unsure if there are limitations in the encoding that would prevent this from working.