Open main menu

CDOT Wiki β

Changes

GPU621/Distributed Workload

910 bytes added, 01:55, 3 December 2018
Algorithms
* parallel_reduce(range, body [, partitioner]);
These functions operate on the <code> blocked_range </code> container class in '''TBB''' to preform operations in parallel as described in the <code> body </code> object, typically by overloading the <code>() operator</code>. The following code snippet will demonstrate a simple <code>parallel_reduce</code> implementation.
<pre>
#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"
 
using namespace tbb;
 
struct Sum {
float value;
Sum() : value(0) {}
Sum( Sum& s, split ) {value = 0;}
void operator()( const blocked_range<float*>& r ) {
float temp = value;
for( float* a=r.begin(); a!=r.end(); ++a ) {
temp += *a;
}
value = temp;
}
void join( Sum& rhs ) {value += rhs.value;}
};
 
float ParallelSum( float array[], size_t n ) {
Sum total;
parallel_reduce( blocked_range<float*>( array, array+n ), total );
return total.value;
}
</pre>
Some things to notice about this code are as follows. All of the reduce operations are done in the overloaded () operator. The <code>join()</code> and <code>Sum(Sum& s, split)</code> split constructor are needed to split the <code>blocked_range</code> , run the operations in parallel then join the results.
24
edits