Changes

Jump to: navigation, search

GPU610/Turing

906 bytes added, 12:58, 13 December 2015
Assignment 3
= Team Turing =
== Team Members ==
# [mailto:cjcampbell2@myseneca.ca?subject=gpu610 Colin Campbell], Team Leader# [mailto:jyshin3@myseneca.ca?subject=gpu610 James Shin]# [mailto:cbailey8@myseneca.ca?subject=gpu610 Chaddwick Bailey]
[mailto:cjcampbell2@myseneca.ca;jyshin3@myseneca.ca;cbailey8@myseneca.ca?subject=dps901-gpu610 Email All]
== Progress ==
====== Conclusions ======
There were no major issues converting the code to CUDA as it's a simple matrix, which made it very straightforward. When I first converted the code I did however notice that it was running slower than the CPU version. This was caused by inefficient block sizing. I managed to fix it by modifying the number of threads per block until it made more efficient use of the CUDA cores. In the end, without any other optimizations it runs at around twice the speed of the CPU code.
 
==== Chadd's Findings ====
Profiling the Diffusion Equation I noticed that the majority of the time is spent in the Evolvetimestep function.
Using my home computer with a GTX 560 ti Nvidia graphics card I ran a matrix 9000x9000 10 times. I have the runtime results in the chart below.
 
I've created a chart comparing the runtimes
 
[[Image:Runtime.png|400px]]
 
 
I used 32 threads per block size in my paralellization of the nested for loop found in the Evolvetimestep function. The results were very good.
=== Assignment 3 ===
The first optimization I was able to make was using thread coalescence. This lead to a moderate per step speedup as seen in this graph.
 
[[Image:ColinCampbellGPU610A3G1.png|600px| ]]
 
I then attempted to modify the code to use shared memory. Unfortunately the way the algorithm accesses rows and columns out of order made this not viable. I tried to convert the problem to use tiling to get around this but was not able to make it work correctly. Because of this I was not able to implement any more optimizations as most were based around using shared memory efficiently.

Navigation menu