Open main menu

CDOT Wiki β

Changes

GPU621/CUDA

880 bytes added, 15:54, 11 August 2021
CUDA Performance Testing
= CUDA Performance Testing =
 
== CUDA Code ==
 
The following CUDA Matrix Multiplication Code was used for all CUDA Matrix Multiplication tests:
 
=== CUDA MATRIX MULTIPLICATION HOST CODE ===
[[File:CUDA_Host_Code.png]]
 
=== CUDA MATRIX MULTIPLICATION DEVICE HEADER ===
[[File:CUDA_Device_Header_Code.png]]
 
=== CUDA MATRIX MULTIPLICATION DEVICE CODE ===
[[File:CUDA_Device_Code.png]]
 
 
== OpenCL Code ==
 
The following OpenCL Matrix Multiplication Code was used for all Matrix Multiplication tests on an OpenCL compatible GPU (same GPU for CUDA tests):
 
=== OpenCL COMPILER DIRECTIVES ===
[[File:OpenCL_Complier_Directives.png]]
 
=== OpenCL HOST CODE ===
[[File:OpenCL_Host_Code.png]]
 
=== OpenCL DEVICE CODE ===
[[File:OpenCL_Device_Code.png]]
 
== Matrix Multiplication – CUDA vs. OpenMP ==
 
The OpenMP Matrix Multiplication solution from Workshop 3 was used for testing in this scenario.
 
[[File:CUDA-OpenMP.png|500px]]
 
For this test, the CUDA code was ran on a CUDA enabled GPU and OpenMP code was ran on the CPU, a truer test of CUDA vs. OpenMP would be to have the OpenMP code run on the GPU.
After learning that matrix multiplication operations are optimized for GPUs and of course CUDA is optimized for NVIDIA GPUs, this seems like an unfair test, with CUDA crushing OpenMP.
== Matrix Multiplication - CUDA vs OpenCL ==
 
[[File:CUDA-OpenCL.png|500px]]
 
As covered earlier, CUDA is proprietary to NVIDIA and only works on CUDA enabled NVIDIA GPUs. This is not the case for OpenCL, which is open source and runs on a wide variety of GPUs and CPUs. This should provide the expected result that CUDA will outperform OpenCL on any given CUDA enabled device.
Interestingly, with a small array size, OpenCL seems to outperform CUDA, this is presumably due to parallel overhead.
36
edits