1
edit
Changes
→Assignment 3
</pre>
As you can see, some clear optimization paths in the method are the two, double for loops that convert the 2D array into a 1D array (and back again). Unfortunately, we couldn't get the CUDA kernel to handle very large 2D arrays in the way we wanted. We tried very hard to do this, but simply could not :(