1D_stencil_SIMD

Author: HERNANDEZ, Pierre Vincent

Project Specification

Machine Specification:

Machine used for the development and running of this project:

OS: Windows 11
System type: 64-bit operating system, x64-based processor
Processor: AMD Ryzen 7 5700U with Radeon Graphics (1.80 GHz)
RAM: 8.00 GB (5.85 GB usable)
Tools used:
- Visual Studio 2022
- Google Colaboratory

Important Links

GitHub Repository Link (for C, x86-64, and SIMD) [1D_stencil_SIMD]
Google Colaboratory Link (for CUDA) [stencil_1D_CUDA] (*Use DLSU email to access)

Comparative Analysis of the Performance of Different Kernels

[Number of elements: 2^20] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Based on the average execution time in microseconds for the Debug Build of the C, x86-64, and SIMD version, it shows that the C version of the 1-D stencil implementation is the slowest. It is followed by the x86-64 version, which is faster by approximately 4000uS or 4mS. SIMD shows to be the faster than its C and x86-64 version. Looking closely between the x86-64 version and SIMD version, SIMD is just approximately 200uS faster than the x86-64, which is not that noticeable expecially with these range of values.

Release Build (C, x86-64, & SIMD)

Upon running the Release Build of the C, x86-64, and SIMD version, it appears that C and SIMD version was able to speed up their execution. While the x86-64, it slowed than a bit compared to its Debug Build. It is also observed that certain optimizations that was done to the code produced some unpredicted behaviors. One example would be the clock() return value and reaction. There are times that the timer variables are not updating.

CUDA version

The CUDA version was able to surpass the execution time speed of the first three versions. Even with a total transfer time of 1400uS for the Y vector from host to device and vice versa, and adding the average execution time of 97.22uS, it is still much faster. It is also important to know that this is implemented and ran in the Google Colaboratory due to the local machine used not having an external NVIDIA GPU.

[Number of elements: 2^24] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Increasing the vector size to 2^24, the execution time difference between C, x86, and SIMD is much more evident. C is still the slowest one, with x86-64 in the middle and SIMD being the fastest.

Release Build (C, x86-64, & SIMD)

Even in the Release Build, execution time difference is more evident than the previous Release Build when the number of elements is 2^20.

CUDA version

By only looking at the exection time of the CUDA kernel responsible for the 1-D stencil operation, it is extremely faster compared to the C, x86-64 and SIMD version. But it is also important to note that the data must be transfered from the host memory to the GPU memory for a computational operation. Combining the transfer and execution time, it totals to approximately 26mS. By comparing it to the Debug Build of the SIMD version, the SIMD version is completely faster compared to the CUDA version.

[Number of elements: 2^28] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

The vector size was first increased to 2^30 and 2^29, but it appears that the RAM of the machine currently being used bottlenecks the program. Thus, it was decreased to 2^28. Even with the decrease of vector size it appears that there is paging happening in the system to be able to handle approximately 4GB of data. Difference between C and x86-64, abd between x86-64 and SIMD in the Debug Build is higher. This displays how fast is SIMD operation is.

Release Build (C, x86-64, & SIMD)

By using the Release Build, C and SIMD increased in execution speed, but unfortunately for the x86-64 version it slowed down a bit.

CUDA version

Lastly, the CUDA version is still faster compared to the C and x86-64 version. But a bit faster than the SIMD version but close to it (especially when adding the total transfer time in CUDA).

It can be concluded that implementations in CUDA and SIMD is much faster compared to the vanilla C and x86-64. The Release Build in Visual Studio is somehow making the program built to be a bit quirky with its behavior. Additionally, since the comparison between the three versions (C, x86-64, and SIMD) and CUDA was done on a different machine there could be a possible future investigation on this. Lastly, the RAM of the machine used could've also bottleneck the pure performance of the said versions or implementations, which could be an interesting discovery if all of this are done in the same machine with the highest or lowest specs possible to see which and where they could fit best.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vs		.vs
screenshots		screenshots
.gitattributes		.gitattributes
.gitignore		.gitignore
1D_stencil_SIMD.sln		1D_stencil_SIMD.sln
1D_stencil_SIMD.vcxproj		1D_stencil_SIMD.vcxproj
1D_stencil_SIMD.vcxproj.filters		1D_stencil_SIMD.vcxproj.filters
1D_stencil_SIMD.vcxproj.user		1D_stencil_SIMD.vcxproj.user
README.md		README.md
main.c		main.c
mp_specs.png		mp_specs.png
stencil.asm		stencil.asm
stencil_1D_CUDA.ipynb		stencil_1D_CUDA.ipynb
stencil_SIMD.asm		stencil_SIMD.asm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1D_stencil_SIMD

Author: HERNANDEZ, Pierre Vincent

Project Specification

Machine Specification:

Important Links

Comparative Analysis of the Performance of Different Kernels

[Number of elements: 2^20] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Release Build (C, x86-64, & SIMD)

CUDA version

[Number of elements: 2^24] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Release Build (C, x86-64, & SIMD)

CUDA version

[Number of elements: 2^28] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Release Build (C, x86-64, & SIMD)

CUDA version

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1D_stencil_SIMD

Author: HERNANDEZ, Pierre Vincent

Project Specification

Machine Specification:

Important Links

Comparative Analysis of the Performance of Different Kernels

[Number of elements: 2^20] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Release Build (C, x86-64, & SIMD)

CUDA version

[Number of elements: 2^24] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Release Build (C, x86-64, & SIMD)

CUDA version

[Number of elements: 2^28] C vs. x86-64 vs. SIMD vs. CUDA

Debug Build (C, x86-64, & SIMD)

Release Build (C, x86-64, & SIMD)

CUDA version

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages