1
edit
Changes
→Assignment 3
=== Assignment 3 ===
The first optimization was did was to precompute the product of rows * cols outside of the kernel itself. That makes sense for our code, because we're effectively running through 1000 image files of size, 18mb. That way, we pull some strain off the GPU.
When we ran that, we got improved the performance from 22 milliseconds to 21 milliseconds. It sounds small, but spanned over the course of many more images being processes, it's quite the increase.
[[Image:Result1.png]]
[[Image:improvement1.png]]
We tried to use shared memory, however our array size was simply too large.