57
edits
Changes
Group 6
,→Array Processing
In this following profile example, n = 1000
<pre>
Flat profile:
Each sample counts as 0.01 seconds.
0.68 1.49 0.01 init(float**, int)
0.00 1.49 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z4initPPfi
</pre>
<pre>
Call graph
Index by function name
[10] _GLOBAL__sub_I__Z4initPPfi (arrayProcessing.cpp) [2] init(float**, int) [1] multiply(float**, float**, float**, int)
</pre>From the call graph, multiply() took major runtime to more than 99%, as it contains 3 for-loop, which T(n) is O(n^3). Besides, init() also became the second busy one, which has a O(n^2).
As the calculation of elements is independent of one another - leads to an embarrassingly parallel solution. Arrays elements are evenly distributed so that each process owns a portion of the array (subarray). It can be solved in less time with multiple compute resources than with a single compute resource.