Difference between revisions of "GPU621/Distributed Workload"

From CDOT Wiki
Jump to: navigation, search
(Algorithms)
Line 49: Line 49:
 
* parallel_reduce(range, body [, partitioner]);
 
* parallel_reduce(range, body [, partitioner]);
 
These functions operate on the <code> blocked_range </code> container class in '''TBB''' to preform operations in parallel as described in the <code> body </code> object, typically by overloading the <code>() operator</code>. The following code snippet will demonstrate a simple <code>parallel_reduce</code> implementation.
 
These functions operate on the <code> blocked_range </code> container class in '''TBB''' to preform operations in parallel as described in the <code> body </code> object, typically by overloading the <code>() operator</code>. The following code snippet will demonstrate a simple <code>parallel_reduce</code> implementation.
 +
<pre>
 +
#include "tbb/parallel_reduce.h"
 +
#include "tbb/blocked_range.h"
 +
 +
using namespace tbb;
 +
 +
struct Sum {
 +
    float value;
 +
    Sum() : value(0) {}
 +
    Sum( Sum& s, split ) {value = 0;}
 +
    void operator()( const blocked_range<float*>& r ) {
 +
        float temp = value;
 +
        for( float* a=r.begin(); a!=r.end(); ++a ) {
 +
            temp += *a;
 +
        }
 +
        value = temp;
 +
    }
 +
    void join( Sum& rhs ) {value += rhs.value;}
 +
};
 +
 +
float ParallelSum( float array[], size_t n ) {
 +
    Sum total;
 +
    parallel_reduce( blocked_range<float*>( array, array+n ), total );
 +
    return total.value;
 +
}
 +
</pre>
 +
Some things to notice about this code are as follows. All of the reduce operations are done in the overloaded () operator. The <code>join()</code> and <code>Sum(Sum& s, split)</code> split constructor are needed to split the <code>blocked_range</code> , run the operations in parallel then join the results.

Revision as of 01:55, 3 December 2018

Overview

TBB

Is a template library developed by Intel to provide methods to facilitate parallel programming. This is done by dividing a computation into tasks that can be scheduled to run in parallel threads on multi-core processors
Threading Building Blocks includes algorithms, concurrent containers, locks and memory allocation tools.
TBB is designed to work with any C++ compiler.

#include <tbb/tbb.h>

blocked_range<int> range0(0 ,40);
for (auto i = range.begin(); i != range.end(); i++) {
	 b[i] = 2 * a[i] + b[i];
}

STL

The Standard Template Library also extends useful functionality, including generic data structures, containers, iterators and algorithms that can be used to write clean efficient code.
The person who in 1979 was initially interested with ideas of generic programming, his work at AT&T and Bell Laboratories eventually lead to a proposal to the ANSI/ISO for the standardization of STL into the C++ standard.

#include <iostream>
#include <vector>
int main () {
  std::vector<int> myvector;
  for (int i=0; i < 6; i++) myvector.push_back(i);

  for (std::vector<int>::iterator it = myvector.begin(); it != myvector.end(); ++it)
    std::cout << ' ' << *it;
  std::cout << '/n';
}

Comparison

Both libraries use C++ templates to provide generic programming structures. The libraries do overlap when it comes to the functionality they provide, however STL is designed to be more general use and TBB specializes on parallel programming with threads.

Iterators

Both libraries use random access iterators to ease navigation of containers. TBB follows the standard set by STL and the ISO C++ standard, but they also extend them so that tbb::concurrent_vector<T> can be used safely in parallel threads.

Containers

STL implements the following common containers

  • vector
  • list
  • queue
  • stack
  • map

TBB does not implement as many containers however it does include some that are useful in parallel programming and extends their functionality.

  • concurrent_hash_map<T>
  • cuncurrent_vector<T>
  • concurrent_queue<T>

Algorithms

Some serial algorithms exist for STL that can preform tasks such like searching and sorting. These functions are typically used to operate on the containers like std::merge() and std::sort()
The algorithms in TBB are much more vital to the usefulness of the library. TBB uses templated functions like

  • parallel_for(range, body [, partitioner]);
  • parallel_scan(range, body [, partitioner]);
  • parallel_reduce(range, body [, partitioner]);

These functions operate on the blocked_range container class in TBB to preform operations in parallel as described in the body object, typically by overloading the () operator. The following code snippet will demonstrate a simple parallel_reduce implementation.

#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"

using namespace tbb;

struct Sum {
    float value;
    Sum() : value(0) {}
    Sum( Sum& s, split ) {value = 0;}
    void operator()( const blocked_range<float*>& r ) {
        float temp = value;
        for( float* a=r.begin(); a!=r.end(); ++a ) {
            temp += *a;
        }
        value = temp;
    }
    void join( Sum& rhs ) {value += rhs.value;}
};

float ParallelSum( float array[], size_t n ) {
    Sum total;
    parallel_reduce( blocked_range<float*>( array, array+n ), total );
    return total.value;
}

Some things to notice about this code are as follows. All of the reduce operations are done in the overloaded () operator. The join() and Sum(Sum& s, split) split constructor are needed to split the blocked_range , run the operations in parallel then join the results.