Open main menu

CDOT Wiki β

Changes

Sudo

1,520 bytes added, 19:39, 10 December 2015
Progress
= Sudo =
== Team Members ==
# [mailto:kmpaiva@senecacollege.ca?subject=gpu610 kmpaiva], Kevin Paiva
# [mailto:mlucic3@senecacollege.ca?subject=gpu610 mlucic3], Mateya Lucic
#
[mailto:mlucic3@senecacollege.ca;mailto:kmpaiva@senecacollege.ca?subject=gpu610 Sudo Email All]
== Progress ==
The vast majority of time is being occupied within the addFaces method, where a vast part of that time is occupied getting the index of the edges in the mesh.
 
<u>'''Parallelizable?'''</u>
 
If the code is reorganized to first find whether a vertex or edge exists in the mesh and then separate the ones that exist from the ones that don’t and perform the necessary operations on them separately and in parallel, it could possibly result in an improvement in run-time.
 
If the code is reorganized to first find whether a vertex or edge exists in the mesh and then separate the ones that exist from the ones that don’t and perform the necessary operations on them separately and in parallel, it could possibly result in an improvement in run-time.
=== Assignment 2 ===
 
 
 
The application which I had profiled in assignment 1 didn't end up being very viable for parallelization due to the design of the application. I was forced to search for a new application to parallelize, and I managed to come across a small steganography application which upon profiling it became quite apparent that by performing the encoding on the GPU would yield improvements in runtime.In order to fairly compare the serial application with the parallelized version I had to rewrite the serial code slightly to perform many of the same operations, with the only difference being the encoding algorithm.
 
 
 
 
The following graph displays the runtime (measured in microseconds) of encoding various txt file sizes (measured in kilobytes).
 
[[File:CudaSteganographyRuntime.png]]
 
 
 
 
The following is the serial code:
 
[[File:CudaSteganographySerial.png]]
 
 
 
 
 
The following is the kernel I wrote:
 
[[File:CudaSteganographyKernel.png]]
 
=== Assignment 3 ===
 
 
I improved on my previous application by cleaning some logic and adding optimization to it. I noticed that even on cards with a compute capability of 3.0 that it did not accept a grid with x dimensions larger than 65535, therefore I had to rewrite my code to adhere to the limitation. Within the kernel itself, there was opportunity to pre-fetch values into register memory, in order to reduce latency during operations on those values. There was no requirement for shared memory due to the fact that threads did not need to share any memory at all.
 
 
The following is the new kernel :
 
 
[[File:CudaSteganographyKernelA3.png]]
 
 
 
The following is run-time comparison between my old and my new kernel :
 
 
 
[[File:CudaSteganographyRuntimeA3.png]]