Changes

BetaT

3,893 bytes added, 20:37, 6 March 2017

no edit summary

=== Tests ran with no optimization on linux ===

By using the command line argument cat /proc/cpuinfo

We can find the CPU specs for the VM we are operating linux through.

for this test we have:

Dual-Core AMD Opteron

cpu MHz at 2792

{| class="wikitable sortable" border="1" cellpadding="5"

System Specifications

== Application 2 Calculating Pi==

This application is pretty straightforward, it calculates Pi to the decimal point which is given by the user. So an input of 10 vs 100,000 will calculate Pi to either the 10th or 100 thousandth decimal.

=== problem ===

Inside the function calculate we have:

void calculate(std::vector<int>& r, int n)

{

int i, k;

int b, d;

int c = 0;

for (i = 0; i < n; i++) {

r[i] = 2000;

}

for (k = n; k > 0; k -= 14) {

d = 0;

i = k;

for (;;) {

d += r[i] * 10000;

b = 2 * i - 1;

r[i] = d % b;

d /= b;

i--;

if (i == 0) break;

d *= i;

}

//printf("%.4d", c + d / 10000);

c = d % 10000;

}

I Believe the 2 for loops will cause a delay in the program execution time.

=== Tests ran with no optimization on linux ===

for this test the linux VM has:

Dual-Core AMD Opteron

cpu MHz at 2792

{| class="wikitable sortable" border="1" cellpadding="5"

|+ Naiver Equation

! n !! Time in Milliseconds

|-

||1000 ||2||

|-

||10000 ||266||

|-

||100000 ||26616||

|-

||200000 ||106607||

|-

||500000 ||671163||

|}

=== gprof ===

As with the other application that was profiled it can be a bit hard to read the gprof results. Basically the program spends 87% of the time in the calculate() method and with a problem size of 500,000 it spend a cumulative of 354 seconds. Hopefully we can get this number down.

But the main thing to take away here is that main() is 89.19% and takes 97 seconds.

Flat profile:

Each sample counts as 0.01 seconds.

% cumulative self self total

time seconds seconds calls s/call s/call name

87.39 354.08 354.08 1 354.08 395.84 calculate(std::vector<int, std::allocator<int> >&, int)

10.31 395.84 41.76 678273676 0.00 0.00 std::vector<int, std::allocator<int> >::operator[](unsigned int)

=== Potential Speed Increase with Amdahls Law ===

Using Amdahls Law ---- > Sn = 1 / ( 1 - P + P/n )

We can examine how fast out program is capable of increasing its speed.

P = is the part of the program we want to optimize which from above is 87.39%

n = the amount of processors we will use. One GPU card has 384 processors or CUDA cores and another GPU we will use has 1020 processor or CUDA cores.

Applying the algorithm gives us.

Amdahls Law for GPU with 384 Cores---- > Sn = 1 / ( 1 - 0.8739 + 0.8739/384 )

Sn = 7.789631

Amdahls Law for GPU with 1024 Cores---- > Sn = 1 / ( 1 - 0.8739 + 0.8739/1024 )

Sn = 7.876904

'''Therefor According to Amdahls law we can expect a 7.7x to 7.9x increase in speed.

97 seconds to execute main / 7.8 amdahls law = 45.3948 seconds to execute after using GPU

Interestingly the last application had p = 89% (9x speed up) and this application p = 87% (7.8x speed up), 2% made quite a difference.'''

--------------------------------------------------------------------------------------------------

=== Potential Speed Increase with Gustafsons Law ===

Gustafsons Law S(n) = n - ( 1 - P ) ∙ ( n - 1 )

(Quadro K2000 GPU) S = 380 - ( 1 - .8739 ) * ( 380 - 1 ) = 332.2081

(GeForce GTX960 GPU) S = 1024 - ( 1 - .8739 ) * ( 1024 - 1 ) = 894.9837

Using Gustafsons law we see drastic changes in the amount speed increase, this time the additional Cores made a big difference and applying these speed ups we get:

(Quadro K2000 GPU) 354 seconds to execute / 332.2081 = 1.065597

(GeForce GTX960 GPU) 354 seconds to execute / 894.9837 = 0.395537

== Conclusions with Profile Assessment ==

Based on the problem we have for both applications which is quadratic(A nested for loop). ~~the~~ The time spent processing the main problem which was 89.19% and 87.39%. Plus the amount of time in seconds the program spent on the particular problem which was 97 & 354 seconds. I believe it is feasible to optimize ~~this~~ one of these application with CUDA to improve performance. I will attempt to optimize the naiver strokes flow velocity program as that application is more interesting to me.

Jadach1

212

edits

Changes

BetaT

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools