Changes

GPU621/Distributed Workload

910 bytes added, 01:55, 3 December 2018

→‎Algorithms

* parallel_reduce(range, body [, partitioner]);

These functions operate on the <code> blocked_range </code> container class in '''TBB''' to preform operations in parallel as described in the <code> body </code> object, typically by overloading the <code>() operator</code>. The following code snippet will demonstrate a simple <code>parallel_reduce</code> implementation.

<pre>

#include "tbb/parallel_reduce.h"

#include "tbb/blocked_range.h"

using namespace tbb;

struct Sum {

float value;

Sum() : value(0) {}

Sum( Sum& s, split ) {value = 0;}

void operator()( const blocked_range<float*>& r ) {

float temp = value;

for( float* a=r.begin(); a!=r.end(); ++a ) {

temp += *a;

}

value = temp;

}

void join( Sum& rhs ) {value += rhs.value;}

};

float ParallelSum( float array[], size_t n ) {

Sum total;

parallel_reduce( blocked_range<float*>( array, array+n ), total );

return total.value;

}

</pre>

Some things to notice about this code are as follows. All of the reduce operations are done in the overloaded () operator. The <code>join()</code> and <code>Sum(Sum& s, split)</code> split constructor are needed to split the <code>blocked_range</code> , run the operations in parallel then join the results.

Mjwolfe

24

edits

CDOT Wiki β

Changes

GPU621/Distributed Workload

CDOT Wiki ^β