Changes

← Older edit

GPU610/Turing

449 bytes added, 12:58, 13 December 2015

→‎Assignment 3

= Team Turing =

== Team Members ==

# [mailto:cjcampbell2@myseneca.ca?subject=gpu610 Colin Campbell~~], Team Leader# [mailto:jyshin3@myseneca.ca?subject=gpu610 James Shin]# [mailto:cbailey8@myseneca.ca?subject=gpu610 Chaddwick Bailey~~]

[mailto:cjcampbell2~~@myseneca.ca;jyshin3@myseneca.ca;cbailey8~~@myseneca.ca?subject=dps901-gpu610 Email All]

== Progress ==

I ~~use~~ used 32 threads per block size in my paralellization of the nested for loop found in the Evolvetimestep function. The results were very good. === Assignment 3 ===The first optimization I was able to make was using thread coalescence. This lead to a moderate per step speedup as seen in this graph. [[Image:ColinCampbellGPU610A3G1.png|600px| ]] I then attempted to modify the code to use shared memory. Unfortunately the way the algorithm accesses rows and columns out of order made this not viable. I tried to convert the problem to use tiling to get around this but was not able to make it work correctly. Because of this I was not able to implement any more optimizations as most were based around using shared memory efficiently.

Colin Campbell

1

edit

CDOT Wiki β

Changes

GPU610/Turing

CDOT Wiki ^β