Open main menu

CDOT Wiki β

Changes

Sirius

2,139 bytes added, 22:23, 6 April 2018
Algorithms (Joseph Pildush)
=== Algorithms (Joseph Pildush)===
My topic is about Algorithms and the stress on the CPU of running them with large sizes of arrays. When using most algorithms with a set of arrays of a small size, the algorithms tend to finish faster then a second. When using these algorithms with larger sized arrays, based on my results it can be seen that the stress continues to increase as the size of the arrays increase, which in turn would also increase the execution time of the algorithms. In a situation when these algorithms are being called multiple times on large sized arrays, there would be an immense increase in execution time which may also result in the program becoming overall slow and/or hanging.
For this reason, when developing very advanced applications that would require to make multiple algorithm calls on very large sized arrays, it seems quite beneficial to use CUDA to implement parallel programming on the GPU.<br><br>
Algorithms Used: std::sort, saxpy, prefix-sum
<br>
==== Results ====
<source>
------------------------------------------------------------------
|# of Elements | std::sort | saxpy | prefix-sum |
<br><br>
| 1,000,000 | 66 | 4 | 75 |
<Br>|. 5,000,000 | 360 | 18 | 293 |<br>
| 10,000,000 | 742 | 36 | 584 |
<br>
| 50,000,000 | 3983 | 189 | 2343 |
<br>
| 100,000,000 | 8393 | 378 | 4649 |
------------------------------------------------------------------<br/source>
==== Graph ====
[[File:AlgorithmGraph.png]]
 
==== Flat Profile ====
<source>
Flat profile:
 
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
47.03 8.24 8.24 1 8.24 11.15 prefixSum(std::vector<int, std::allocator<int> >&, int, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > >, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > >)
29.39 13.39 5.15 499999999 0.00 0.00 _ZNSt6vectorIiSaIiEE12emplace_backIJiEEERiDpOT_
17.41 16.44 3.05 4 0.76 1.83 setRandArr(std::vector<int, std::allocator<int> >&, int)
2.63 16.90 0.46 saxpyAlg(int, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > >, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > >)
2.34 17.31 0.41 stdSort(std::vector<int, std::allocator<int> >&, int, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > >, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> > >)
1.20 17.52 0.21 112 0.00 0.00 _ZNSt6vectorIiSaIiEE17_M_realloc_insertIJRKiEEEvN9__gnu_cxx17__normal_iteratorIPiS1_EEDpOT_
0.00 17.52 0.00 5 0.00 0.00 printTiming(char const*, std::chrono::duration<long long, std::ratio<1ll, 1000000000ll> >)
0.00 17.52 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z10setRandArrRSt6vectorIiSaIiEEi
0.00 17.52 0.00 1 0.00 0.00 void std::__insertion_sort<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_less_iter)
</source>
=== Assignment 2 ===
66
edits