Open main menu

CDOT Wiki β

Changes

GPU610/Turing

342 bytes added, 23:01, 14 November 2015
Assignment 3
There were no major issues converting the code to CUDA as it's a simple matrix, which made it very straightforward. When I first converted the code I did however notice that it was running slower than the CPU version. This was caused by inefficient block sizing. I managed to fix it by modifying the number of threads per block until it made more efficient use of the CUDA cores. In the end, without any other optimizations it runs at around twice the speed of the CPU code.
=== Assignment 3 =Chadd's Findings ====Profiling the Diffusion Equation I noticed that the majority of the time is spent in the Evolve Time Step function.Using my home computer with a GTX 560 ti Nvidia graphics card I ran a matrix 9000x9000 10 times. I have the runtime results in the chart below.  I've created a chart comparing the runtimes [[Image:Runtime.png|400px]]