[[File:Cuda-profilerun.png]]
===== ;Device Usage % =====;Tiny-Shay.jpg: 0.01%;Medium-Shay.jpg: 0.39%;Large-Shay.jpg: 0.93%;Huge-Shay.jpg: 1.26%
(36 kernel launches per run)
===== ;Timeline Results =====For each run, I examined 4 operations that took the most amount of time. For a tiny image, allocating memory on the device took the longest amount of time, but still, it took well under half a second. cudaMalloc took the same short amount of time for every case however. Initializing the CImg variable from the .jpg file quickly became the biggest issue. This operation is CPU bound, and is dependent on the logic of ImageMagick. Copying the rotated image back to the host (cudaMemcpy) starts to become a hot spot as well between the large and huge sized image is a noticeable increase. [[File:Summary-2.png]]
=== Assignment 3 ===