70
edits
Changes
Optimization
At this point, I am noticing the delay in constantly transferring data between the RAM and Video RAM. Splitting the array into multiple sections requires constant checking of the left and right columns of those arrays. Thus, I will re-factor the entire code to use only 1 CPU thread and remove MPI.
====== Optimization ======
After using shared memory and some constant values to perform operations in the kernel, my GPU no longer crashes on extreme operations involving millions of steps. It also outperforms my CPU running the MPI version of this application in 4 threads.