44
edits
Changes
no edit summary
{{GPU621/DPS921 Index | 20187}}
<!-- How Threads Works -->
<h4>Implicit Barrier</h4>
<pre class="code">// OpenMP - Parallel Construct
// omp_parallel.cpp
<p>Output:</p>
<pre class="code">Hello
Hello
Hello
Hello
Hello
Hello
Fin
</pre>
<!-- C++11 Threads -->
<p>Unlike OpenMP, C++11 does <i>not</i> use parallel regions as barriers for its threading. When a thread is run using the C++11 thread library, we must consider the scope of the parent thread. If the parent thread would exit before the child thread can return, it can crash the program if not handled correctly.</p>
<p>When using the join function on the child thread, the parent thread will be blocked until the child thread returns.</p>
<pre class="code"> t2
____________________
/ \
__________/\___________________|/\__________
t1 t1 t2.join() | t1
</pre>
<h4>Creating a Thread</h4><p>The following is the template used for the overloaded thread constructor. The thread begins to run on initialization.<br>f is the function, functor, or lambda expression to be executed in the thread. args are the arguements to pass to f.</p><pre class==== Implicit Barrier ===="code">template<class Function, class... Args>explicit thread(Function&& f, Args&&... args);</pre>
<!-- How Multithreading Works -->
<pre class= Threading in C++11 ="code">#include <iostream>#include <omp.h>
int main() {
#pragma omp parallel
{
int tid = omp_get_thread_num();
std::cout << "Hi from thread "<< tid << '\n';
}
return 0;
}
</pre>
<p>Essentially what is happening in the code above is that the threads are intermingling creating a jumbled output. All threads are trying to access the cout stream at the same time. As one thread is in the stream another may interfere with it because they are all trying to access the stream at the same time. </p>
<h3>Threading with C++11</h3>
<p>Unlike OpenMP, C++11 threads are created by the programmer instead of the compiler.</p>
<p>std::this_thread::get_id() is similar to OpenMP's omp_get_thread_num() but instead of an int, it returns a </p>
std::cout << "Creating threads...\n";
for (int i === Creating a Thread ===0; i < numThreads; i++) threads.push_back(std::thread(func1, i));
std::cout << "All threads have launched!\n";
std::cout << "Syncronizing...\n";
return 0;
}
</pre>
<p>Since all threads are using the std::cout stream, the output can appear jumbled and out of order. The thread can take in a function, functor, or lambda expression as its first argument, followed by 0 or more arguments solution to this problem will be passed into presented in the functionnext section.</p>
<h2>How Syncronization Works</h2>
<pre class="code">#include <iostream>
#include <omp.h>
int main()
{
#pragma omp parallel
{
int tid = omp_get_thread_num();
#pragma omp critical
std::cout << "Hi from thread "<< tid << '\n';
}
return 0;
}
</pre>
<pre class="code">Hi from thread 0
Hi from Thread 1
Hi from thread 2
Hi from thread 3
</pre>
<h4>parallel for</h4>
<h3>Syncronization with C++11</h3>
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
void func1(int index) {
std::lock_guard<std::mutex> lock(mu);
// mu.lock();
std::cout << "Index: " << index << " - ID: " << std::this_thread::get_id() << std::endl;
// mu.unlock();
}
int main() { int numThreads ==== Using Atomic ====10;
for (int i = 0; i < numThreads; i++)
threads.push_back(std::thread(func1, i));
for (auto& thread : threads)
thread.join();
<p>Using mutex, we're able to place a lock on the data used by the threads to allow for mutual exclusion. This is similar to OpenMP's critical in that it only allows one thread to execute a block of code at a time.</p>
<!-- How Data Sharing Works -->
<!-- Data Sharing With OpenMP --> <h3>Data Sharing With OpenMP</h3> <p></p><p>In OpenMP by default all data is shared and passed by reference. Therefore, we must be careful how the data is handled within the parallel region if accessed by multiple threads at once.</p> <p>For Example:</p><pre class="code">#include <iostream>#include <omp.h> int main() { int i = 12; #pragma omp parallel { #pragma omp critical std::cout << "\ni = " << ++i; } std::cout << "\ni = " << i << std::endl; return 0;}</pre> <p>Output:</p><pre class="code">i = 13i = 14i = 15i = 16i = 16</pre> <p>What we can see using the output from the code above is that even after the parallel region is closed we can see that our variable i holds a different value than it did originally. This is due to the fact that the variable is shared inside and outside the parallel region. In order to pass this variable by value to each thread we must make this variable non-shared. This is done by using firstprivate() This is considered a clause, which comes after a construct. firstprivate(i) will take i and make it private to each thread.</p> <p>For example:</p><pre class="code">#include <iostream>#include <omp.h> int main() { int i = 12; #pragma omp parallel firstprivate(i) { #pragma omp critical std::cout << "\ni = " << ++i; } std::cout << "\ni = " << i << std::endl;}</pre> <p>New Output:</p><pre class="code">i = 13i = 13i = 13i = 13i = 12</pre> <p>What we can see here is that through each indiviual thread the value of i stays at 12 then gets incremented by the thread to 13. On the last line of the output we can see that i = 12 showing that the parallel region did not change the value of i outside the parallel region.</p> <!-- Data Sharing with C++11 --> <h3>Data Sharing with C++11</h3><p>The C++11 thread library requires the programmer to pass in the address of the data that should be shared by the threads.</p> <pre class="code">// cpp11.datasharing.cpp #include <iostream>#include <vector>#include <thread>#include <mutex> std::mutex mu; void func1(int value) { std::lock_guard<std::mutex> lock(mu); std::cout << "func1 start - value = " << value << std::endl; value = 0; std::cout << "func1 end - value = " << value << std::endl;} void func2(int& value) { std::lock_guard<std::mutex> lock(mu); std::cout << "func2 start - value = " << value << std::endl; value *= 2; std::cout << "func2 end - value = " << value << std::endl;} int main() { int numThreads = 5; int value = 1; std::vector<std::thread> threads; for (int i = 0; i < numThreads; i++) { if (i == 2) threads.push_back(std::thread(func1, value)); else threads.push_back(std::thread(func2, std::ref(value))); } for (auto& thread : threads) thread.join(); return 0;}</pre> <pre class="code">func2 start - value = 1func2 end - value = 2func2 start - value = 2func2 end - value = 4func1 start - value = 1func1 end - value = 0func2 start - value = 4func2 end - value = 8func2 start - value = 8func2 end - value = 16</pre> <!-- How Syncronization Works Continued --> <h2>How Syncronization Works Continued</h2> <!-- Syncronization Continued With OpenMP --> <h3>Syncronization Continued With OpenMP</h3> <h4>atomic</h4> <p>The atomic construct is a way of OpenMP's implementation to serialize a specific operation. The advantage of using the atomic construct in this example below is that it allows the increment operation with less overhead than critical. Atomic ensures that only the operation is being performed one thread at a time.</p> <pre class="code">int main() { int i = 0; #pragma omp parallel num_threads(10) { #pragma omp atomic i++; } std::cout << i << std::endl; return 0;}</pre> <pre class="code">10</pre> <!-- Syncronization Continued with C++11 --> <h3>Syncronization Continued with C++11</h3> <h4>atomic</h4><p>Another way to ensure syncronization of data between threads is to use the atomic library.</p> <pre class="code">// cpp11.atomic.cpp #include <iostream>#include <vector>#include <thread>#include <atomic> std::atomic<int> value(1); void add() { ++value;} void sub() { --value;} int main() { int numThreads = 5; std::vector<std::thread> threads; for (int i = 0; i < numThreads; i++) { if (i == 2) threads.push_back(std::thread(sub)); else threads.push_back(std::thread(add)); } for (auto& thread : threads) thread.join(); std::cout << value << std::endl; return 0;}</pre> <p>The atomic value can only be accessed by one thread at a time. This is a similar lock procedure as mutex except the lock is defined by the atomic wrapper instead of the programmer.</p> <pre class="code">4</pre> <!-- Thread Creation Test --> <h2>Thread Creation Test</h2> <pre class="code">#include <iostream>#include <string>#include <chrono>#include <vector>#include <thread>#include <omp.h> using namespace std::chrono; void reportTime(const char* msg, int size, steady_clock::duration span) { auto ms = duration_cast<milliseconds>(span); std::cout << msg << "- size : " << std::to_string(size) << " - took - " << ms.count() << " milliseconds" << std::endl;} void empty() {} void cpp(int size) { steady_clock::time_point ts, te; ts = steady_clock::now(); for (int i = 0; i < size; i++) { std::vector<std::thread> threads; for (int j = 0; j < 10; j++) threads.push_back(std::thread(empty)); for (auto& thread : threads) thread.join(); } te = steady_clock::now(); reportTime("C++11 Threads", size, te - ts);} void omp(int size) { steady_clock::time_point ts, te; ts = steady_clock::now(); for (int i = 0; i < size; i++) { #pragma omp parallel for num_threads(10) for (int i = 0; i < 10; i++) empty(); } te = steady_clock::now(); reportTime("OpenMP", size, te - ts);} int main() { // Test C++11 Threads cpp(1); cpp(10); cpp(100); cpp(1000); cpp(10000); cpp(100000); std::cout << std::endl; // Test OpenMP omp(1); omp(10); omp(100); omp(1000); omp(10000); omp(100000); return 0;}</pre> <pre class="code">C++11 Threads- size : 1 - took - 1 millisecondsC++11 Threads- size : 10 - took - 10 millisecondsC++11 Threads- size : 100 - took - 125 millisecondsC++11 Threads- size : 1000 - took - 1703 millisecondsC++11 Threads- size : 10000 - took - 20760 millisecondsC++11 Threads- size : 100000 - took - 168628 milliseconds OpenMP- size : 1 - took - 0 millisecondsOpenMP- size : 10 - took - 0 millisecondsOpenMP- size : 100 - took - 0 millisecondsOpenMP- size : 1000 - took - 6 millisecondsOpenMP- size : 10000 - took - 62 millisecondsOpenMP- size : 100000 - took - 616 milliseconds</pre> [[File:Cpp11threadgraph.png | 700px]][[File:CppmutexoutputOpenmpthreadgraph.png | 300px700px]]