32
edits
Changes
→Business Point of View Comparison for STL and TBB
'''GPU621 Darth Vector: C++11 STL vs TBB Case Studies'''
''Join me, and together we can fork the problem as master and thread''
==Generic Programming==
Generic Programming is a an objective when writing code to make algorithms reusable and with the least amount of specific code. Intel describes generic programming as "''writing the best possible algorithms with the least constraints''". An example of generic code is STL's templating functions which provide generic code that can be used with many different types without requiring much specific coding for the type( an addition template could be used for int, double, float, short, etc without requiring re-coding). A non-generic library requires types to be specified, meaning more type-specific code has to be created. [[File:gputemplates.PNG |thumb|center|600px| An example of generic coding]]
==TBB Background==
==STL Background==
After a long period of engineering and development of the library, it obtained final approval in July 1994 to become part of the language standard.
==List of A Comparison from STL Functions:==In general, most of STL is intended for use within a serial environment. This however changes with C++17's introduction of parallel algorithms.
<u>'''Algorithms'''</u>
Are supported by STL for various algorithms such as sorting, searching and accumulation. All can be found within the header "'''<algorithm>'''". Examples include sort() and reverse functions()functions.
<u>'''STL iterators'''</u>
<u>'''Containers'''</u>
STL supports a variety of containers for data storage. Generally these containers are supported in parallel for read actions, but does not safely support writing to the container with or without reading at the same time. There are several header files that are included such as "'''<vector>'''", "'''<queue>'''", and "'''<deque>'''".'''Most STL containers do not support concurrent operations upon them.'''They are coded as:<pre>#include<vector>#include<dequeue>#include<queue> int main(){ vector<type> myVector; dequeue<type> cards; queue<type> SenecaYorkTimHortons; }</pre> <u>'''Memory Allocater'''</u>The use of a memory allocator allows more precise control over how dynamic memory is created and used. It is the default memory allocation method(ie not the "new" keyword) for all STL containers. The allocaters allow memory to scale rather then using delete and 'new' to manage a container's memory. They are defined within the header file "'''<memory>'''" and are coded as: <pre>#include <memory>
==A Comparison from TBB=====Containers==='''<u>concurrent_queue</u>''' : This is the concurrent version of the STL container Queue. This container supports first-in-first-out data storage like its STL counterpart. Multiple threads may simultaneously push and pop elements from the queue. Queue does NOT support and front() or back() in concurrent operations(the front could change while accessing). Also supports iterations, but are slow and are only intended for debugdebugging a program. This is defined within the header "'''tbb/concurrent_queue.h'''" and is coded as: <pre>
#include <tbb/concurrent_queue.h>
//....//
tbb:concurrent_queue<typename> name; </pre>
'''<u>concurrent_vector</u>''' : This is a container class for vectors with concurrent(parallel) support. These vectors do not support insertion or erase operations but do support operations done by multiple threadssuch as push_back(). Note that when elements are inserted, they cannot be removed without calling the clear() member function on it, which removes every element in the array. The container when storing elements does not guarantee that elements will be stored in consecutive addresses in memory. This is defined within the header "'''tbb/concurrent_vector.h'''" and is coded as: <pre>
#include <tbb/concurrent_vector.h>
//...//
tbb:concurrent_vector<typename> name; </pre>
'''<u>concurrent_hash_map</u>''' : A container class that supports hashing in parallel. The generated keys are not ordered and there will always be at least 1 element for a key. Defined within "'''tbb/concurrent_hash_map.h'''"
===Algorithms===
<u>'''parallel_for:'''</u> Provides concurrent support for for loops. This allows data to be divided up into chunks that each thread can work on. The code is defined in "'''tbb/parallel_for.h'''" and takes the template of:
<pre>
</pre>
<u>'''parallel_invoke:'''</u> Provides support for parallel calling to functions provided in the arguments. It is defined within the header "'''tbb/parallel_invoke.h'''" and is coded as: <pre>
</pre>
==Lock Convoying Problem=Allocaters====What Handles memory allocation for concurrent containers. In particular is a Lock?===used to help resolve issues that affect parallel programming. Called '''scalable_allocater<type>''' and '''cache_aligned_allocater<type>'''. Defined in "'''#include <tbb/scalable_allocator.h>'''"
<pre>
To use a lock, you program must be working in parallel(ex #include <thread>) and should be completing something in parallel. You can find c++11 locks with #include <mutex>
Note that there can be problems with locks. If a thread is locked but it is never unlocked, any other threads will be forced to wait which may cause performance issues. Another problem is called "Dead Locking" where each thread may be waiting for another to unlock (and vice versa) and the program is forced to wait and wait .
If we attempt to find the data in parallel with other operations ongoing, 1 thread could search for the data, but another could update the vector size during that time which causes problems with thread 1's search as the memory location may change should the vector need to grow(performs a deep copy to new memory).
Locks can solve the above issue but cause significant performance issues as the threads are forced to wait for each otherbefore continuing. This performance hit is known as '''Lock Convoying'''.
[[File:DarthVector ThreadLock.PNG |thumb|center|400px600px| Performance issues inside STL]]
===Lock Convoying in TBB===
TBB attempts to mitigate the performance issue from parallel code when accessing or completing an operation on a container through its own containers such as concurrent_vector.
Through '''concurrent_vector''', every time an element is accessed/changed, a return of the index location is given. TBB promises that any time an element is pushed, it will always be in the same location, no matter if the size of the vector changes in memory. With a standard vector, when the size of the vector changes, the data is copied over. If any threads are currently traversing this vector when the size changes, any iterators may no longer be valid.This support also goes further for containers so that multiple threads can iterate through the container while another thread may be growing the container. An interesting catch though is that anything iterating may iterate over objects that are being constructed, ensuring construction and access remain synchronized. [[File:Gpuconcur.PNG |thumb|center|600px| concurrent_vector use with multiple threads]]
TBB also provides its own versions of the mutex such as ''spin_mutex'' for when mutual exclusion is still required.
You can find more information on convoying and containers here: https://software.intel.com/en-us/blogs/2008/10/20/tbb-containers-vs-stl-performance-in-the-multi-core-age
''If only you knew the power of the Building Blocks''
===Why was TBB slower for Int?===
==Business Point of View Comparison for STL and TBB==
===Which library is better depending the on the use case?===
The real question is when should you parallelize your code, or to just keep it serial? TBB helps to lower the cost of smaller performance benefitis for multi-threading and STL is for single threading workloads. Due The fastest known serial algorithm maybe difficult or impossible to TBB requiring less effort to implementparallelize. Compared Some aspects to other multi-threading librarieslook out for when parallelizing your code are;
*If are just trying to parallelize a section of your code with a simple mapOverhead whether it maybe in communication, scanidling, or reduce pattern. Without much thought TBB has you covered.*When working with large collections of data TBB with its use of block range coupled with it algorithms makes it simpler to come up with solutions for the collectionload imbalance, synchronization, and excess computation
Resource: http://ppomorsk.sharcnet.ca/Lecture_2_d_performance.pdf
*Thread Creation, terminating, and synchronizing, partitioning, and management must be handled by you. This increases the work load and the complexity, the thread creation and overall resource for STL is managed by a combination of libraries.
'''What you don’t need to worry about'''
*Making sorting, searching algorithms.
*Partitioning data.
*Array algorithms; like copying, assigning, and checking data
Note all algorithms is done in serial, and may not be thread safe
====TBBWorries and Responsibilities====*Thread Creation, terminating, and synchronizing, partitioning, thread creation, and management is managed by TBB. This make you need not to worry about the heavy constructs of threads which are close to the hardware level.
*Making a solution from close to hardware level allows Own Parallel algorithms (makes you need not to be flexible to worry about the solution you heavy constructs of threads that are wanting to make. But present in the major downside is the requirement lower levels of implementing the foundations first to make your solution workprogramming. It also simple map, scan, pipeline, or reduce TBB has the potential of making your program inefficient if not done correctly.you covered
'''Benefit'''
The downside of TBB is since much of the close to hard hardware management is done be hide the scenes, it makes you has a developer have less control on finetuning your program. Unlike how STL with the threading library allows you to do.
===Licensing===
TBB is dual-licensed as of September 2016
*COM license as part of suites products. Offers one year of technical support and products updates
*Apache v2.0 license for Open source code. Allows the user of the software the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software, under the terms of the license, without concern for royalties.
===Companies and Products that uses TBB===
*DreamWorks (DreamWorks Fur Shader)
*Blue Sky Studios (animation and simulation software)
*Pacific Northwest National Laboratory (Ultrasound products)
*More: https://software.intel.com/en-us/intel-tbb/reviews