Open main menu

CDOT Wiki β

Changes

DPS921/Team team

6,541 bytes added, 18:57, 4 December 2020
no edit summary
== Multitasking in C++11 ==
C++ 11 introduced the ability to code using multitasking, a form of parallelism on shared memory instead of serial. This is known as “virtual parallelism” as it is not true parallelism (see figure 1). Every task created through “virtual parallelism” or multitasking in the Standard Template Library runs on one available core. If multiple tasks are created, they are split into minor tasks, and then split into an order of when they should be running. This means that you can have two tasks in progress at the same time, but they are never working at the same time.
 
[[File:multitask_parallel.PNG |thumb|center|600px| ]]
== What is provided? ==
Mutex’s are the lock mechanism mentioned above. The purpose of a mutex is to make an area of memory accessible only to the thread that has accessed it. Once a thread is done with its changes to the memory, the thread will unlock it, move on, and allow the next thread to make its changes.<br>
Mutex’s in C++ are introduced with the <Mutex> header file from the std::mutex class. A mutex has two main functions, std::mutex::lock() and std::mutex::unlock().<br>
<br>
Mutex code example:
<pre>
// mutex example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
#include <vector>
 
// THIS IS THE MUTEX LOCK EXAMPLE
std::mutex mtx; // mutex for critical section
 
void print_block(int n, char c) {
// critical section (exclusive access to std::cout signaled by locking mtx):
 
mtx.lock();//comment out these lines to remove mutex
 
for (int i = 0; i < n; ++i) { std::cout << c; }
std::cout << '\n';
 
mtx.unlock();//comment out these lines to remove mutex
}
<pre>int main(){ std::thread th1(print_block, 50, '*'); std::thread th2(print_block, 50, '$');  th1.join(); th2.join();
return 0;
}
</pre>
ResultResults with the mutexes in place:
<pre>
Money in Wallet is: 5000$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$**************************************************
</pre>
<b>If the locks are removed , the output will change every run, but . The results will look similar to:</b>
<pre>
results$$$$$$$**************************************************$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
</pre>
 
=== Atomicity ===
The Atomic operations library throws out the concept of using locks and allows for lockless concurrent programming. The atomic class handles everything for you when it comes to making sure the memory space does not encounter bugs when running between multiple threads.
<br>
<br>
Atomic code example:
<pre>
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
#include <vector>
#include <atomic>
 
 
//THIS IS THE ATOMIC CODE //Result is the same as if you were to use mutex locks. Wallet will contain 5000
class Wallet
{
std::atomic<int> mMoney;
public:
Wallet() :mMoney(0) {}
int getMoney() { return mMoney; }
void addMoney(int money)
{
for (int i = 0; i < money; ++i)
{
mMoney++;
}
}
};
 
 
int testMultithreadedWallet()
{
Wallet walletObject;
std::vector<std::thread> threads;
for (int i = 0; i < 5; ++i) {
threads.push_back(std::thread(&Wallet::addMoney, &walletObject, 1000));
}
for (int i = 0; i < threads.size(); i++)
{
threads.at(i).join();
}
return walletObject.getMoney();
}
 
int main()
{
 
int val = 0;
for (int k = 0; k < 1000; k++)
{
if ((val = testMultithreadedWallet()) != 5000)
{
std::cout << "Error at count = " << k << " Money in Wallet = " << val << std::endl;
//break;
}
}
 
std::cout << "Money in Wallet is: " << val << std::endl;
 
return 0;
}
</pre>
Results:
<pre>
Money in Wallet is: 5000
</pre>
== Intel Threading Building Blocks ==
== Case Studies between STL & TBB ==
=== <u>Comparison of TBB parallel sort, STL sort</u> ===
<pre>
#include <stddef.h>
const auto startTime = high_resolution_clock::now();
// same sort call as above, but with par_unseq:
sort(std::execution::par_unseq, sorted.begin(), sorted.end());
tbb::parallel_sort(sorted);
Results on an intel i7-3770k, average of 5 runs:
[[File:sortcomparison.PNG |thumb|center|600px| Results of STL vs TBB sort algorithms]]
STL parallel sort performed similarly to TBB parallel sort<br>
<br>
=== <u>Comparison of TBB inclusive scan, STL inclusive scan</u> ===
<pre>
#include <functional>
#include <iostream>
#include <iterator>
#include <numeric>
#include <vector>
#include <random>
#include <chrono>
#include <execution>
#include <tbb/tbb.h>
 
const size_t testSize = 10'000'000;
const int iterationCount = 5;
 
 
void print_results(const char* const tag, const std::vector<int>& result,
std::chrono::high_resolution_clock::time_point startTime,
std::chrono::high_resolution_clock::time_point endTime) {
printf("%s: %fms\n", tag,
std::chrono::duration_cast<std::chrono::duration<double, std::milli>>(endTime - startTime).count());
}
int main()
{
std::uniform_int_distribution<int> dis(1, 10);
std::random_device rd;
std::vector<int> data(testSize);
 
//generating data
for (auto& d : data) {
d = dis(rd);
}
 
//Serial inclusive sum
for (int i = 0; i < iterationCount; ++i)
{
std::vector<int> result(data);
const auto startTime = std::chrono::high_resolution_clock::now();
std::inclusive_scan(std::execution::seq, data.begin(), data.end(), result.begin());
 
const auto endTime = std::chrono::high_resolution_clock::now();
print_results("STL Serial Inclusive sum", result, startTime, endTime);
std::cout << "Scan result: " << result[testSize - 1] << "\n";
}
 
//Inclusive sum parallel unseq
for (int i = 0; i < iterationCount; ++i)
{
std::vector<int> result(data);
const auto startTime = std::chrono::high_resolution_clock::now();
 
std::inclusive_scan(std::execution::par_unseq, data.begin(), data.end(), result.begin());
 
const auto endTime = std::chrono::high_resolution_clock::now();
print_results("STL Inclusive sum parallel unseq", result, startTime, endTime);
std::cout << "Scan result: " << result[testSize - 1] << "\n";
}
 
//Inclusive sum parallel
for (int i = 0; i < iterationCount; ++i)
{
std::vector<int> result(data);
const auto startTime = std::chrono::high_resolution_clock::now();
 
std::inclusive_scan(std::execution::par, data.begin(), data.end(), result.begin());
 
const auto endTime = std::chrono::high_resolution_clock::now();
print_results("STL Inclusive sum parallel", result, startTime, endTime);
std::cout << "Scan result: " << result[testSize - 1] << "\n";
}
 
for (int i = 0; i < iterationCount; ++i)
{
std::vector<int> result(data);
auto body = [&](const tbb::blocked_range<int>& r, int sum, bool is_final_scan)->int {
int temp = sum;
for (int i = r.begin(); i < r.end(); ++i) {
temp = temp + data[i];
if (is_final_scan)
result[i] = temp;
}
return temp;
};
 
const auto startTime = std::chrono::high_resolution_clock::now();
tbb::parallel_scan(tbb::blocked_range<int>(0, testSize),0, body, [](int left, int right) {
return left + right;
}
);
const auto endTime = std::chrono::high_resolution_clock::now();
print_results("TBB Inclusive sum parallel", result, startTime, endTime);
std::cout << "Scan result: " << result[testSize - 1] << "\n";
}
}
 
</pre>
Results on an intel i7-3770k, average of 5 runs:
[[File:scancomparison.PNG |thumb|center|600px| Results of STL vs TBB scan algorithms]]
<br>
Raw Results:
<pre>
STL Serial Inclusive sum: 9.695900ms
Scan result: 55000150
STL Serial Inclusive sum: 13.188200ms
Scan result: 55000150
STL Serial Inclusive sum: 9.139700ms
Scan result: 55000150
STL Serial Inclusive sum: 10.686900ms
Scan result: 55000150
STL Serial Inclusive sum: 7.812900ms
Scan result: 55000150
STL Inclusive sum parallel unseq: 39.005100ms
Scan result: 55000150
STL Inclusive sum parallel unseq: 29.428300ms
Scan result: 55000150
STL Inclusive sum parallel unseq: 30.756500ms
Scan result: 55000150
STL Inclusive sum parallel unseq: 26.180600ms
Scan result: 55000150
STL Inclusive sum parallel unseq: 28.135300ms
Scan result: 55000150
STL Inclusive sum parallel: 28.015000ms
Scan result: 55000150
STL Inclusive sum parallel: 30.922700ms
Scan result: 55000150
STL Inclusive sum parallel: 38.238000ms
Scan result: 55000150
STL Inclusive sum parallel: 29.686100ms
Scan result: 55000150
STL Inclusive sum parallel: 28.986200ms
Scan result: 55000150
TBB Inclusive sum parallel: 59.180100ms
Scan result: 55000150
TBB Inclusive sum parallel: 13.341900ms
Scan result: 55000150
TBB Inclusive sum parallel: 13.508600ms
Scan result: 55000150
TBB Inclusive sum parallel: 10.201700ms
Scan result: 55000150
TBB Inclusive sum parallel: 9.710400ms
Scan result: 55000150
</pre>
18
edits