93
edits
Changes
Savy Cat
,→Unsigned Char vs. Float
==== Unsigned Char vs. Float ====
The first real improvement came from changing PX_TYPE from float back to unsigned char, as used in the serial version. Unsigned char is good enough for all .jpg colour values (255). GPU GPUs are designed to perform operations on floating point numbers, however, we are not performing any calculations outside of the indexing. The performance of the kernel was the same for float or unsigned char. We copy the source image to device once, and back to the host 12 times, making size relevant.
{| class="wikitable"
|Float
|-
|BreadTiny_Shay.jpg|Pie1.93 KB|7.73 KB
|-
|ButterMedium_Shay.jpg|Ice cream5.71 MB|22.8 MB
|-
|ButterLarge_Shay.jpg|Ice cream22.8 MB|91.4 MB
|-
|ButterHuge_Shay.jpg|Ice cream91.4 MB|365 MB
|}
This saves almost one second worth of latency for the largest file, bringing cudaMemcpy down to about the same time as the kernel execution:
[[File:Summary-5.png]]