Open main menu

CDOT Wiki β

Changes

GPU621/CamelCaseTeam

4,319 bytes added, 17:00, 3 August 2021
Data Sharing section added
</syntaxhighlight>
Comparison to the serial and C11 thread library equivalent of the code above, OpenMP shows a greater improvement over the other techniques. In addition, the performance increases as the number of thread increases threads increase by a significant amount in comparison to the C11 thread library method. == Data Sharing ==  === C++ Threading === ==== mutex and std::lock_guard ==== There are different ways how programmers can make sure that shared data will be used only by one thread at a time. One of which was already presented as an example above. Mutex library gives the programmer a tool to lock a part of the region to be sure that it is not accessed by any other thread at this moment. .lock() and .unlock() mutex methods can be used as in [[#Parallel_Version|this example]], however there is a safer, exceptionless method with the use of std::lock_guard mutex wrapper. Here is an example of a simple reduce code program with the use of c++ threads: <syntaxhighlight lang="cpp">#include <iostream>#include <thread>#include <vector>#include <mutex>#include "timer.h"std::mutex guard; void threadFunc(int ithread, double& accum) { double buffer = 1; for (int i = ithread * 10000000; i < ithread * 10000000 + 10000000; i++) { buffer += 1.0 / (i + 1); }  std::lock_guard<std::mutex> lock(guard); accum = accum + buffer;} int main(int argc, char* argv) { Timer t;  std::vector<std::thread> threads;  double accum = 0;  t.reset();  t.start(); for (int i = 0; i < 8; i++) { threads.push_back(std::thread(threadFunc, i, std::ref(accum))); } for (auto& thread : threads) thread.join(); t.stop();  std::cout << "std::lock_guard version - " << accum << " - " << t.currtime() << std::endl; }</syntaxhighlight> std::lock_guard wrapper automatically unlocks range at the end of the function and returns access to the shared data to access by other threads. ==== std::atomic ==== Another important library for c++ threading library is atomic[https://en.cppreference.com/w/cpp/atomic/atomic]. Atomic is a wrapper object that is free from data races. That allows different threads to simultaneously access it without creating undefined behaviour. An example of the reduction code above with std::atomic wrapper: <syntaxhighlight lang="cpp">#include <iostream>#include <thread>#include <vector>#include <atomic>#include "timer.h"void threadFuncAtomic(int ithread, std::atomic<double>& accum) { double buffer = 1; for (int i = ithread * 10000000; i < ithread * 10000000 + 10000000; i++) { buffer += 1.0 / (i + 1); }  accum = accum + buffer;} int main(int argc, char* argv) { Timer t;  t.reset();  std::vector<std::thread> threadsAtomic;  std::atomic<double> accumAtomic = 0;  t.start(); for (int i = 0; i < 8; i++) { threadsAtomic.push_back(std::thread(threadFuncAtomic, i, std::ref(accumAtomic))); } for (auto& thread : threadsAtomic) thread.join(); t.stop();  std::cout << "std::atomic version - " << accumAtomic << " - " << t.currtime() << std::endl;}</syntaxhighlight> On average both solutions do not show significant differences in performance. And because locks are OS-dependent and atomic is dependent on processor support of this feature, though the performance of these two different approaches depends mostly on the hardware. === OpenMP === OpenMP provides easy to use solution to share data among the threads. Both #pragma omp critical and #pragma omp atomic work the same way by allowing only one thread at a time to access critical region, however, atomic has much lower overhead and where available uses hardware advantage of atomic operations if there is any provided. Here is an OpenMP atomic alternative to the reduction code example: <syntaxhighlight lang="cpp">#include <iostream>#include <omp.h>#include "timer.h" int main(int argc, char* argv) { Timer t;  double accum = 0;  t.reset();  omp_set_num_threads(8); t.start();#pragma omp parallel { int ithread = omp_get_thread_num();  double buffer = 1; for (int i = ithread * 10000000; i < ithread * 10000000 + 10000000; i++) { buffer += 1.0 / (i + 1); } #pragma omp atomic accum += buffer; } t.stop();  std::cout << accum << " - " << t.currtime() << std::endl;}</syntaxhighlight> === Results === Performance tests on my machine showed the following results: c++ threads: std::lock_guard and atomic solutions - 65 ms on average.OpenMP: critical solution - 31 ms on average. OpenMP provides a much better performance-based solution and much more comfortable and straightforward tools to use.
== References ==
2
edits