93
edits
Changes
GPUSquad
,→Assignment 2
=== Assignment 2 ===
We parallelized the original code by placing the jacobi calculations into a kernel. For this initial parallel version, we only used 1D threading and had each thread run a for loop for the other dimension.
The iters loop launches a kernel for each iteration and we use double buffering (where we choose to launch the kernel with either d_a, d_b or d_b, d_a) since we can't simply swap pointers like in the serial code.
<source>
=== Assignment 3 ===
Optimization techniques used
* Get rid of the for loop in the kernel and use 2D threading within blocks
* Use gpu constant memory for jacobi calculation constants
* Utilize the ghost cell pattern for shared memory within blocks