Changes

Avengers

137 bytes added, 15:36, 30 March 2019

→‎Progress

In the code, the function that took up a significant amount of time was the calculateDimensions() function. The flat profile indicates that this function takes 97.67% of the execution time.

==== Identifying Parallelize-able Code ====

calculateDimensions() has 3 nested for loops. Each for loop is used to set the value of one of the triangle sides. The inner-most for loop compares the two shorter sides of the triangle by first squaring them and then adding the squared results together. A condition is used to check if the sum of the squared side values is equivalent to the squared value of the hypotenuse. The results are printed when the condition is true.

[[File:NestedLoops.PNG]]

The nested for loops represent the serial way of calculating the dimensions. ==== Offloading Process ====To parallelize ~~this~~the code mentioned above, we did the following:

1. Use CUDA device properties to design the grid and blocks.

[[File:kernel.PNG]]

==== Time Logging ====

To compare the timings of the serial version and the parallel version, we modified the original file to have 2 functions: calculateCUDA() and calculateSerial(). The execution of both of these functions was timed to see which function was quicker.

calculateCUDA() contains the parallelized version of the application. It sets the properties of a grid and its blocks, and launches a kernel to find the Pythagorean triples. The time taken to find the triples is printed out after execution.

==== Results ====

Below is a graph that shows the time taken for execution of both the serial approach and the parallel approach.

Jsidhu26

46

edits

Changes

Avengers

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools