45
edits
Changes
→Asynchronous Multi-Threading
===Creating and executing Threads===
====OpenMp====
Inside a declared OpenMp parallel region, if not specified via an environment variable OMP_NUM_THREADS or the library routine omp_get_thread_num() , OpenMp will automatically decide how many threads are needed to execute parallel code.
An issue with this approach is that OpenMp is unaware how many threads a CPU can support. A result of this can be OpenMp creating 4 threads for a single core processor which may result in a degradation of performance.
}
====C++ 11====
C++ 11 Threads on the contrary always required to specify the number of threads required for a parallel region. If not specified by user input or hard-coding, the number of threads supported by a CPU can also be accurately via the std::thread::hardware_concurrency(); function.
OpenMp automatically decides what order threads will execute. C++ 11 Threads require the developer to specify in what order threads will execute. This is typically done within a for loop block. Threads are created by initializing the std::thread class and specifying a function or any other callable object within the constructor.
=====C++ 11=====
The C++ 11 thread libraries provide the mutex class to support mutual exclusionand synchronization. <br>
The mutex class is a synchronization primitive that can be used to protect shared data from being accessed by multiple threads.
std::mutex is usually not accessed directly, instead std::unique_lock and std::lock_guard are used to manage locking.
}
Serial Implementation
A future is an object that can retrieve a value from some provider object (also known as a promise) or function. Simply put in the case of multithreading, a future object will wait until its associated thread has completed and then store its return value.
To retrieve or construct a future object, these functions may be used.
However, a future object can only be used if it is in a valid state. Default future objects constructed from the std::async template function are not valid and must be assigned a valid state during execution.
A std::future references a shared state that cannot be shared to other asynchronous return objects. If multiple threads need to wait for the same shared state, std::shared_future class template should be used.
Openmp unfortunately does not support asynchronous multi-threading as is designed for designed for parallelism, not concurrency.
===Programming Models=======SPMD==== An example of the SPMD programming model in STD Threads using an atomic barrier #include <iostream> #include <iomanip> #include <cstdlib> #include <chrono> #include <vector> #include <thread> #include <atomic> using namespace std::chrono; std::atomic<double> pi; void reportTime(const char* msg, steady_clock::duration span) { auto ms = duration_cast<milliseconds>(span); std::cout << msg << " - took - " << ms.count() << " milliseconds" << std::endl; } void run(int ID, double stepSize, int nthrds, int n) { double x; double sum = 0.0; for (int i = ID; i < n; i = i C+ nthrds){ x = (i + 0.5)*stepSize; sum += 4.0 / (1.0 + x*x); } sum = sum * stepSize; pi = pi + sum; } int main(int argc, char** argv) { if (argc != 3) { std::cerr << argv[0] << ": invalid number of arguments\n"; return 1; } int n = atoi(argv[1]); int numThreads = atoi(argv[2]); steady_clock::time_point ts, te; // calculate pi by integrating the area under 1/(1 + x^2) in n steps ts = steady_clock::now(); std::vector<std::thread> threads(numThreads); double stepSize = 1.0 / (double)n; for (int ID = 0; ID < numThreads; ID++) { int nthrds = std::thread::hardware_concurrency(); if (ID == 0) numThreads = nthrds; threads[ID] = std::thread(run, ID, stepSize, 8, n); } te = steady_clock::now(); for (int i = 0; i < numThreads; i++){ threads[i].join(); } std::cout << "n = " << n << std::fixed << std::setprecision(15) << "\n pi(exact) = " << 3.141592653589793 << "\n pi(calcd) = " << pi << std::endl; reportTime("Integration", te - ts); // terminate char c; std::cout << "Press Enter key to exit ... "; std::cin.get(c); } ====Question & Awnser=11 Threads and OpenMp compatibility===
Can one safely use C++11 multi-threading as well as OpenMP in one and the same program but without
interleaving them (i.e. no OpenMP statement in any code passed to C++11 concurrent features and no
C++11 concurrency in threads spawned by OpenMP)?
On some platforms efficient implementation could only be achieved if the OpenMP run-time is the
and x86 is usually considered an "experimental" platform (other vendors are usually much more conservative).
===Conclusion===