Changes

Hu3Team

477 bytes added, 18:06, 5 December 2014

→‎CUDA Coding

}

</pre>

We made use of shared memory to speed up the memory access for the kernel, along with coalesced memory access. We were already doing a simple reduction for getting the biggest difference, but with these tow optimizations alone we were able to get a speed up of almost 50% from the first not-optimized CUDA solution.

Because the code works getting neighboring elements from the matrix, we had to make a bigger check on the heat calculation part, to avoid illegal memory access.

====Comparing the results====

Bruno Di Giuseppe Cardoso De Carvalh

1

edit

CDOT Wiki β

Changes

Hu3Team

CDOT Wiki ^β