Changes

DPS921/Game of Threads

3,574 bytes added, 15:06, 13 November 2017

no edit summary

==OpenMP and C++ Comparisons==

{| class="wikitable"

|

|OpenMP

|Hand Threaded C++

|-

|Code Portability

|

*Portable

|

*Not every system supports the thread library natively or completely (android)

|-

|Codability

|

*Lower modifications to the parallel code

*Compiler directives can be used to specify work distribution using sections

*Easier codability through #pragma constructs for less control of parallelism for each thread

|

*Greater modifications to the serial code

*Work distribution must be directly coded

*Loops must be tuned by hand, no schedule parameter to be changed.

*Finer granularity of parallelism allows finer control of parallel regions. For ex barriers or critical sections may concern only some threads while in openMP they are bound to all threads.

|-

|Data sharing

|

*Facilitates making data structures thread-safe

*Many OpenMP clauses(lastprivate, firstprivate, critical) can be used to lock data or share

|

*Data structures containing all private information for each thread must be created

*Separate copy of shared data is created for each thread

*Much harder to synchronize data, must be done by hand

|-

|Speed

|

*Typically faster as threads are not created and deconstructed like native C++ threads. Thread pools are used

|

*Slower than OpenMP unless a thread pool is created and utilized

*Many speed improvements provided by OpenMP need to be recreated

|}

==Code examples==

===Example of loop decompositin===

===Classic threads===

<nowiki>Size = last_index/nb_threads;

First = MY_TID * size;

Last = (MY_TID != nb_threads-1) ? first + size : last_index;

For(i=first; i < last; I ++)

Array[i] = … ;</nowiki>

===OpenMP===

<nowiki>#pragma omp parallel for

For(i=0; i < last_index; i++)

Array[i] = …;</nowiki>

===Sample program in OpenMP and C++ threads===

===OpenMP===

<nowiki>#include <chrono>

#include <algorithm>

#include <iostream>

#include <omp.h>

using namespace std::chrono;

static const int num_threads = omp_get_max_threads();

size_t sum = 0;

int main() {

omp_set_num_threads(num_threads);

size_t n = 50000;

steady_clock::time_point ts, te;

ts = steady_clock::now();

#pragma omp parallel for reduction(+:sum)

for (int i = 0; i <= n; i++)

sum += i;

te = steady_clock::now();

std::cout << "sum: " << sum << std::endl;

auto ms = duration_cast<milliseconds>(te - ts);

std::cout << std::endl << "Took - " <<

ms.count() << " milliseconds" << std::endl;

std::getchar();

return 0;

}</nowiki>

===C++ Threads===

<nowiki>#include <iostream>

#include <thread>

#include <mutex>

#include <chrono>

#include <algorithm>

using namespace std::chrono;

static const int num_threads = 10;

size_t sum = 0;

std::mutex m;

int main() {

std::thread t[num_threads];

size_t n = 10000000;

steady_clock::time_point ts, te;

//Launch a group of threads

ts = steady_clock::now();

for (int i = 0; i < num_threads; ++i) {

t[i] = std::thread(std::bind([](int start, int end) {m.lock();

std::cout << "start: " << start << " end: " << end << std::endl;

for (int i = start; i <= end; i++)

sum += i;

m.unlock(); }, i*n/num_threads, i+1 == num_threads ? n : (i+1)*n/num_threads));

}

te = steady_clock::now();

//Join the threads with the main thread

for (int i = 0; i < num_threads; ++i) {

t[i].join();

}

std::cout << sum << std::endl;

auto ms = duration_cast<milliseconds>(te - ts);

std::cout << std::endl << "Took - " <<

ms.count() << " milliseconds" << std::endl;

std::getchar();

return 0;

}</nowiki>

Jlonghi

22

edits

CDOT Wiki β

Changes

DPS921/Game of Threads

CDOT Wiki ^β