Changes

Jump to: navigation, search

DPS915/CodeKirin

1,367 bytes added, 21:20, 5 December 2014
Calculations of Pi
== Progress ==
=== Assignment 1 ===
====== 
'''Brief Overview'''
'''(Note)'''
For some reason the code crashes my graphic driver past 8000000 (8 million) dots, and even at 8 million it crashes most of the time, but the value is still correct. The Nvidia Visual Profiler doesn '''Approach'''t work either Instead of doing everything within the main, I created a separate function for it gets stuck on . All the random number generating timeline, so I used clock_t in is done within the kernel via the Curand command. The kernel is also responsible for all the calculations and uses shared memory for all the threads within the code instead block in order to calculate execution time obtain a partial sum. Here are some snippets of the kernel. Don't think this is 100% accurate thoughcode.
'''Value of 1 MillionSome Code Snippets '''
[[File:MillionMonteCarloIf the dot is within the circle, sets the tid (threadIdx.x) index of the temp array in shared memory to 1 and sync the threads. Then sum up all the 1s in the temp array for that specific block and pass it out into another array.JPG]]
'''Value of 5 Million'''[[File:Code1.JPG]]
[[File:5MillionMonteCarloAfter copying from the device to host, obtain the total sum of results from all kernels by using a for loop through all the indexes and adding the values together. This total sum is then used to calculate the value of pi.JPG]]
'''Value of 8 Million'''[[File:Code2.JPG]]
[[File:8MillionMonteCarlo.JPG]]'''Execution Times for Values of 1, 5 and 8 Million'''
[[File:reportTime.JPG]]
'''Comparison Chart'''
'''Issues'''
 
The main issue for me was to figure out how to use the kernel for this approach. At first I tried to pass a value of either 1 or 0 for whether or not the dot landed within the circle within each thread, and pass it out into an array individually. Later on Chris gave me the idea of getting a partial sum for all the threads within each block and pass that out instead, which is a way better approach.
''' Some Code Snippets '''Another big issue was the crashing of the graphic driver. If the program takes more than 3 seconds to execute, the driver would crash. Even when I changed the registry to allow 15 seconds before crashing, it still crashes at 3.
Sets the tid (threadIdx.x) index of the temp array in shared memory to 1For optimization, when the total <= 1.0I tried using reduction, and sync the threads. Then sum however it didn't seem to speed up all the 1s in the array for that specific block and pass it out into another arrayprogram.
[[File:Code1.JPG]]
After copying from the device to host, obtain the total sum of results from all kernels and calculates the value of pi.'''Different Approach'''
[[File:Code2Another approach to do this is by using a different algorithm, as the one I used at first. However, that program will only go up to 9 significant digits, since anything over will go above the maximum value of a float. This program shows an execution time of 0.05 seconds for all values entered by the user, but will require to use the BigNumber library or such in order to show more significant digits.JPG]]
1
edit

Navigation menu