240
edits
Changes
→Pi
* '''System Specifications'''
OS: Windows 7 (64-bit) CPU: Intel Core i3-2350M @ 2.30GHz GPU: GeForce GT 520MX (48 CUDA cores)
* '''How To Execute On Linux?'''
* '''Stage 2 - Potential Speedup
Using Amdahl's Law:
P = 1708ms /1725ms (using the third test data from below...)
P = 0.99014
n = 48 (processors reported by deviceQuery.exe)
Sn = 1 / ( 1 - P + P/n )
S48 = 1 / ( 1 - 0.99014 + 0.99014 / 48 )
S48 = 32.79988
* '''Stage 1 - Big-O:'''
The (predicted) hotspot begins from line 35 and ends at line 44. Although there are two for loops, the outer for loop executes ''n'' / ''stride'' times while the inner for loop executes ''stride'' times; the actual iteration is just ''n''( O(n) ).
* '''Stage 2 - Potential Speedup:'''
Using Amdahl's Law:
P = 1708ms 10883ms /1725ms 10903ms (using the last sample of the third test data from below...) P = 0.9901499817
n = 48 (processors reported by deviceQuery.exe)
Sn = 1 / ( 1 - P + P/n ) S48 = 1 / ( 1 - 0.99014 99817 + 0.99014 99817 / 48 ) S48 = 3244.7998819015
The maximum speedup on the test system is approximately 33 44 times.
At around 10K iterations, the first decimal is stable.