Changes

Jump to: navigation, search

Savy Cat

357 bytes added, 03:41, 3 April 2018
Profiling With Nsight
;Device Usage %
Tiny-Shay.jpg: 0.01%
 
Medium-Shay.jpg: 0.39%
 
Large-Shay.jpg: 0.93%
 
Huge-Shay.jpg: 1.26%
;Timeline Results
For each run, I examined list the 4 operations that took the most amount of time. For a tiny image, allocating memory on the device took the longest amount of time, but still, it took well under half a second. cudaMalloc took the same short amount of time for every case however. Initializing the CImg variable from the .jpg file quickly became the biggest issue. This operation is CPU bound, and is dependent on the logic of ImageMagick. Copying the rotated image back to the host (cudaMemcpy) starts to become a hot spot as well between the large and huge sized image is a noticeable increase.
[[File:Summary-2.png]]
 
Comparing total run times of the CPU to the CUDA version shows a clear winner as .jpg files increase in size. Rotating Large-Shay.jpg (3264 x 2448) was '''3x''' faster, and Huge-Shay.jpg was '''4.95x''' faster. Tiny and Medium-Shay.jpg actually took longer using the CUDA version, but took less than half a second in both cases.
 
[[File:Summary-3.png]]
=== Assignment 3 ===
93
edits

Navigation menu