Changes

GPU610/Turing

361 bytes added, 11:05, 13 December 2015

→‎Team Turing

I used 32 threads per block size in my paralellization of the nested for loop found in the Evolvetimestep function. The results were very good.

=== Assignment 3 ===

The first optimization I was able to make was using thread coalescence. This lead to a moderate per step speedup as seen in this graph.

[[Image:ColinCampbellGPU610A3G1.png|600px| ]]

I then attempted to modify the code to use shared memory. Unfortunately the way the algorithm accesses rows and columns out of order made this not viable.

Colin Campbell

1

edit

CDOT Wiki β

Changes

GPU610/Turing

CDOT Wiki ^β