1
edit
Changes
→Assignment 2
}
}
==== '''Program Execution Plan''' ====
Pi cuda tests would be conducted with sample counts starting at 100 thousand, with incremental multiplier of 10, to the maximum supported sample count of 134217728 (memory constraint on Nvidia 460 GTX). The blocks and threads values will be 128, 128 respectively throughout all the tests.
==== '''Compilation and Running''' ====
==== '''Conclusion''' ====
Using CUDA technology and parallelizing the serial code in the original code, there is an enormous increase in performance (lower execution time) to calculate , as high as 1372%. In the next (final) phase, an attempt to investigate if shared memory, optimal memory allocation, minimizing said memory access time, and other optimization factors would provide a further increase (lower execution time) in performance for ''pi_cuda''.
=== '''Assignment 3''' ===