39
edits
Changes
→Assignment 3
Upon further inspection of the function and kernel, I realized that the array of pixels taken from the oldImage was never used inside the kernel, so it was removed entirely. This include the removal of its memory allocation and the copying of the array from host to device, further reducing the run time of the function.
[[File:hArrayRemove.jpg]]
Furthermore, I previously put the "check for bounds" calculation and the "fill in empty pixels" calculation inside two separate nested for-loops. I have combined them into one, removing one nested for loops which will increase performance dramatically.
[[File:NestedCombined.jpg]]
Overall, this is what the optimized rotateImage() function and the rotate() kernel looks like:
Profiling with the same images gives the following result.
[[File:OptimizedChart.jpg]]