Avengers

From CDOT Wiki
Revision as of 22:08, 15 February 2019 by Jsidhu26 (talk | contribs) (Jaideep - Array Processing)
Jump to: navigation, search


GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

Team Avengers

Team Members

  1. Bruno Alexander Cremonese de Morais, Pythagorean Triples
  2. Jaideep Sidhu, Array Processing

Progress

Assignment 1

Bruno - Pythagorean Triples

Pythagorean Triples by Violet_82

"A Pythagorean triple is a triple of positive integers a, b, and c such that a right triangle exists with legs a,b and hypotenuse c. By the Pythagorean theorem, this is equivalent to finding positive integers a, b, and c satisfying:

a^2+b^2=c^2. 	

The smallest and best-known Pythagorean triple is (a,b,c)=(3,4,5). The right triangle having these side lengths is sometimes called the 3, 4, 5 triangle. [...]

In addition, one side of every Pythagorean triple is divisible by 3, another by 4, and another by 5. One side may have two of these divisors, as in (8, 15, 17), (7, 24, 25), and (20, 21, 29), or even all three, as in (11, 60, 61)."

Weisstein, Eric W. "Pythagorean Triple." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/PythagoreanTriple.html

This algorithm (supposedly) calculates all Pythagorean triple (not primitive Pythagorean triples) values from initial values of the opposed side of a triangle. The algorithm needs some logical improvement to display only values from the starting range, these improvements should not be hard to implement but it still serves the purpose to calculate all Pythagorean triples from the simplest up to the maximum hypotenuse number given by the user. The code has been slightly modified to receive command line parameters, facilitating its execution on UNIX enviroments.

The flat profile generated by the program is as follows:

Flat profile:

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls  Ts/call  Ts/call  name    
97.67      5.98     5.98                             Triangle::calculateDimensions(double, double, double, int)
 0.00      5.98     0.00     2842     0.00     0.00  Triangle::printDimensions(double, double, double)
 0.00      5.98     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN8TriangleC2Edddi

We can see that the function calculateDimensions, which is called only once is responsible for almost all of the execution time. For this profiling the program calculated all triples from the most primitive (3, 4, 5) up to all the triples that satisfy the hypotenuse value of 1500. This function has O(n^3) with its nested for loops to calculate all possible values for each triple, demanding a lot of computational power to generate all of them.

Since every loop executes the same instructions I believe the parallelization of this code should be straightforward and would make it a lot faster. Calculations could be divided in tasks or the offload could be done in order since one calculation does not depend on the other explicitly.

For further utility, the results could be sorted for later display, since the values will be printed in any order from the GPU results.

Code and Execution Instructions on GitHub

Jaideep - Array Processing

In Blaise Barney's Notes on Array Processing, an example of Array Processing is discussed. The example "demonstrates calculations on 2-dimensional array element; a function is evaluated on each array element."

I used the pseudo-code provided to create a program that creates a 2-dimensional array. The purpose of the program is to create and populate a 2-dimensional array of size n (provided by the user) with random numbers. The code is available in the link below:

! Insert link here !

After using gprof to profile my program, a call graph is generated with this content:

Array Processing - Call Graph.jpeg

From the call graph file, it is evident that the generateRandom() function is an obvious hotspot. It is hogging 100% of the execution time. The function consists of 2 for loops, one nested in the other, which makes the function have a Big-O notation of O(n^2).

The computations involved with each element in the array is independent from the rest of the elements, and therefore this function is a deserving candidate for parallelization. Additionally, the array elements can be evenly distributed into subarrays and a process can be assigned to each subarray.

Assignment 2

Assignment 3