Changes

Installation Wizards

336 bytes added, 15:16, 3 April 2017

→‎Parallel Image Processor

The optimized version of the source code with the kernels we created can be found [https://pastebin.com/R0xEfN9W here].

~~Results:~~ ~~Test CUDA C++~~ ~~Negating~~ For reference of what the image processor we are using actually does, we have attached the sample input and ~~reflecting~~ output image 2we have been using.~~988s 2.752s~~ ~~Enlarge by~~ This is the results for running the image processor without specifying an enlarge scale ~~8, negate~~ which will perform a negation and ~~reflect 5.728s 6~~reflection vertically.~~027s~~

[[File:input.jpg]] [[File:outputPGM.jpg]] Due to device limitations we were only able to profile our program up to an enlarged scale of 8 but our results still showed a performance increase as the enlarge scale got to 4. [[File:GpuA2Spreadsheet.png]] As seen from the results above , our parallel implementation of the image processor shows some significant performance increase as the enlarge scale gets larger. We also noticed that the c++ implementation of the processor ~~seems to run at around the same time as the CUDA implementation~~ ran faster when there was no enlarge and when the ~~image is~~ enlarge was only ~~negated and reflected~~2. ~~However, once the image is scaled by any factor, there is a definite increase in performance from the CUDA implementation. It i also worth noting~~ We believe that this could be due to the ~~profiled times form~~ costly operation of cudaMemcpy() since the ~~CUDA implementation seemed~~ operations we are doing to ~~vary a lot more than~~ the ~~c++ implementation which we think is from~~ pixels are not that intensive, the ~~variation in times~~ time it takes for the ~~cudaMemcpy~~cudaMEmcpy() could easily start to ~~run~~exceed the time of the transformations.

To continue our optimizations, we think that we could get more of a performance increase by minimizing the amount of data copied to and from the GPU. We are going to look into storing the image on the GPU until all transformations have been done and then copying the data back from the GPU.

=== Assignment 3 ===

Kramsamujh

37

edits

Changes

Installation Wizards

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools