Changes

Jump to: navigation, search

DPS915/CodeCookers

131 bytes added, 02:00, 4 December 2014
Assignment 1
My application that I selected is a PI approximation function. PI can be approximated in a number of ways however I chose to use the dartboard algorithm. Although not the fastest, the algorithm is very feasible to parallelize. The main idea behind it can be compared to a dartboard – you throw a random number (n) of darts at the board and note down the darts that have landed within it and those that have not. The image below demonstrates this concept:
[[File:dart.gif]]
During the execution, the program takes the number of iterations through the command line. It generates two random decimal numbers between 0 and 1 and determines if the randomly generated coordinates are inside in the circle. Then it calculates the size of PI. As we increase the number of iteration we are getting a more realistic value of PI.
[[File:output.jpg]]
I created a screenshot from the first execution, the rest of the execution is summarized on the table and the chart below.
[[File:table.jpg]]
  [[File:Pichart.jpg]]
As we can see the hotspot of the program is the pi () function, which takes 100% of the execution time. This function is containing a single for loop, which is calculating the size of the PI. This can be executed independently, because it has no data dependency, therefor it is possible to parallelize with CUDA to speed up the processing time.
'''Conclusion'''
While coding the algorithm for this problem, we had to go through many iterations, first off we had to create it to run serially on the CPU. These results, while stable, were quite slow to process when it came to larger numbers. Next came the GPU port. This section was the bulk of the work over the three assignments as it required us to completely redesign how the program would function. We can say with confidence that based on the results of the port, it would not be worthwhile to even bother implementing the cuda code as the improvement was marginal at best. The third iteration however, when we optimized the code, showed a dramatic improvement in performance. Eight times faster than the GPU port from the second assignment, the optimization blew us away as to the effect that optimization can have on a program. The actual calculations on the arrays alone only ended up taking 35nsec. All in all, from the results collected, we can conclude that there is substantial evidence that parallelization of any form of Monte Carlo or repetitive program involving millions of small calculations would highly benefit from using the GPU.

Navigation menu