46
edits
Changes
Avengers
,→Progress
In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.
==== Identifying Parallelize-able Code ====
calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.
[[File:NestedLoops.PNG]]
The nested for loops represent the serial way of calculating the dimensions. ==== Offloading Process ====To parallelize thisthe code mentioned above, we did the following:
1. Use CUDA device properties to design the grid and blocks.
[[File:kernel.PNG]]
==== Time Logging ====
To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.
calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.
==== Results ====
Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.