On my current setup, I've managed to acquire this performance numbers with CPU vs GPU acquisition for 8.192 MHz sampling rate:
| Acquisition(type) |
Ipp(float) |
Ipp(double) |
ArrayFire(float) |
ArrayFire(double) |
| Time, ms |
134 |
303 |
1269 |
1394 |
This is heavily affected by calculating maximum value and position, as well as statistical parameters here. This approach leads to transferring small amounts of data (4 numbers) over PCI-Express, which is a performance killer.
In my sandbox, I've managed to reduce the acquisition time to ~500 ms by making those functions return af::array (keep data on GPU), but it would require the following code modification, which I don't think is necessary for now.
However, let's keep this documented in case I'd like to return to this in the future.
On my current setup, I've managed to acquire this performance numbers with CPU vs GPU acquisition for 8.192 MHz sampling rate:
This is heavily affected by calculating maximum value and position, as well as statistical parameters here. This approach leads to transferring small amounts of data (4 numbers) over PCI-Express, which is a performance killer.
In my sandbox, I've managed to reduce the acquisition time to ~500 ms by making those functions return
af::array(keep data on GPU), but it would require the following code modification, which I don't think is necessary for now.However, let's keep this documented in case I'd like to return to this in the future.