Open main menu

CDOT Wiki β

Changes

DPS921/Game of Threads

3,574 bytes added, 15:06, 13 November 2017
no edit summary
==OpenMP and C++ Comparisons==
{| class="wikitable"
|
|OpenMP
|Hand Threaded C++
|-
|Code Portability
|
*Portable
|
*Not every system supports the thread library natively or completely (android)
|-
|Codability
|
*Lower modifications to the parallel code
*Compiler directives can be used to specify work distribution using sections
*Easier codability through #pragma constructs for less control of parallelism for each thread
|
 
*Greater modifications to the serial code
*Work distribution must be directly coded
*Loops must be tuned by hand, no schedule parameter to be changed.
*Finer granularity of parallelism allows finer control of parallel regions. For ex barriers or critical sections may concern only some threads while in openMP they are bound to all threads.
|-
|Data sharing
|
*Facilitates making data structures thread-safe
*Many OpenMP clauses(lastprivate, firstprivate, critical) can be used to lock data or share
|
*Data structures containing all private information for each thread must be created
*Separate copy of shared data is created for each thread
*Much harder to synchronize data, must be done by hand
|-
|Speed
|
*Typically faster as threads are not created and deconstructed like native C++ threads. Thread pools are used
|
*Slower than OpenMP unless a thread pool is created and utilized
*Many speed improvements provided by OpenMP need to be recreated
|}
==Code examples==
===Example of loop decompositin===
===Classic threads===
<nowiki>Size = last_index/nb_threads;
First = MY_TID * size;
Last = (MY_TID != nb_threads-1) ? first + size : last_index;
For(i=first; i < last; I ++)
Array[i] = … ;</nowiki>
===OpenMP===
<nowiki>#pragma omp parallel for
For(i=0; i < last_index; i++)
Array[i] = …;</nowiki>
===Sample program in OpenMP and C++ threads===
===OpenMP===
<nowiki>#include <chrono>
#include <algorithm>
#include <iostream>
#include <omp.h>
using namespace std::chrono;
static const int num_threads = omp_get_max_threads();
size_t sum = 0;
 
int main() {
omp_set_num_threads(num_threads);
 
size_t n = 50000;
steady_clock::time_point ts, te;
 
ts = steady_clock::now();
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i <= n; i++)
sum += i;
te = steady_clock::now();
 
std::cout << "sum: " << sum << std::endl;
auto ms = duration_cast<milliseconds>(te - ts);
std::cout << std::endl << "Took - " <<
ms.count() << " milliseconds" << std::endl;
std::getchar();
return 0;
}</nowiki>
===C++ Threads===
<nowiki>#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>
#include <algorithm>
 
using namespace std::chrono;
 
static const int num_threads = 10;
size_t sum = 0;
std::mutex m;
 
int main() {
std::thread t[num_threads];
size_t n = 10000000;
steady_clock::time_point ts, te;
 
//Launch a group of threads
ts = steady_clock::now();
for (int i = 0; i < num_threads; ++i) {
t[i] = std::thread(std::bind([](int start, int end) {m.lock();
std::cout << "start: " << start << " end: " << end << std::endl;
for (int i = start; i <= end; i++)
sum += i;
m.unlock(); }, i*n/num_threads, i+1 == num_threads ? n : (i+1)*n/num_threads));
}
te = steady_clock::now();
 
//Join the threads with the main thread
for (int i = 0; i < num_threads; ++i) {
t[i].join();
}
 
std::cout << sum << std::endl;
auto ms = duration_cast<milliseconds>(te - ts);
std::cout << std::endl << "Took - " <<
ms.count() << " milliseconds" << std::endl;
 
std::getchar();
return 0;
}</nowiki>
22
edits