1
edit
Changes
no edit summary
=== Assignment 2 ===
We chose to parallelize the image processing program. Image processing is easy to parallelize but this project was a challenge due to the number of functions the program has.
==== Benchmarking ====
Intel Core i7 2600K (standard clock)
NVidia GeForce GTX 750 Ti
Two images were used to test each image function, a large one and a small one.
The small sample test image:
[[File:GPU610_2014_1_Team_Eh_sample.jpg]]
The sample after being processed by the canny filter:
[[File:GPU610_2014_1_Team_Eh_canny.jpg]]
==== Results ====
[[File:GPU610_2014_1_Team_Eh_chart.png]]
This chart compares the run times for the original and parallelized image processing functions. We saw dramatic improvements to image filtering performance. All functions are down to constant time with respect to image dimensions, down from O(n^2).
==== Problems Encountered ====
Many of the operations were composed of several different kernels and other operations. To avoid repeated copies to and from the device we wrapped each kernel in a function that took device pointers. That way images could be loaded once and passed through multiple filters without returning them to the host.
Several of the operations scan a pixels neighbors to determine the pixels value. This creates a problem when a pixel is near the edge of an image. To solve this problem we interpenetrated the image not as a flat surface but as a torus. Anytime a thread would access an off image pixel it would wrap around and use a pixel from the opposite side of the image.
=== Assignment 3 ===