Open main menu

CDOT Wiki β

Changes

Team Darth Vector

1,699 bytes removed, 17:09, 17 December 2017
Business Point of View Comparison for STL and TBB
'''TEAM, use this for formatting. [https://en.wikipedia.org/wiki/Help:Cheatsheet Wiki Editing Cheat Sheet]
'''GPU621 Darth Vector: C++11 STL vs TBB Case Studies'''
''Join me, and together we can fork the problem as master and thread''
</pre>
 
<u>'''parallel_invoke:'''</u> Provides support for parallel calling to functions provided in the arguments. It is defined within the header "'''tbb/parallel_invoke.h'''" and is coded as: <pre>
===Allocaters===
Handles memory allocation for concurrent containers. In particular is used to help resolve issues that affect parallel programming. Called '''scalable_allocater<type>''' and '''cache_aligned_allocater<type>'''. Defined in "'''#include <tbb/scalable_allocator.h>'''"
==TBB Memory Allocation & Fixing Issues from Parallel Programming==
To use a lock, you program must be working in parallel(ex #include <thread>) and should be completing something in parallel. You can find c++11 locks with #include <mutex>
Code example or Picture here ^_^<pre>#include <iostream>#include <thread> #include <mutex> //Some threads are spawned which call this function//Declared the following within the class std[[File::mutex NightsWatch;void GameOfThronesClass::GuardTheWall(){ //Protect until Unlock() is called. Only 1 thread may do this below at a time. It is //"locked"NightsWatch.Lock(); //IncrementDaysWithoutWhiteWalkerAttack++;std::cout << "It has been " << DaysWithoutWhiteWalkerAttack << " since the last attack at Castle Black!\n"; //Allow Next thread to execute the above iterationNightsWatchGpulockwhat.Unlock();  }</pre>PNG |thumb|center|700px| Mutex Example]]
Note that there can be problems with locks. If a thread is locked but it is never unlocked, any other threads will be forced to wait which may cause performance issues. Another problem is called "Dead Locking" where each thread may be waiting for another to unlock (and vice versa) and the program is forced to wait and wait .
Locks can solve the above issue but cause significant performance issues as the threads are forced to wait for each other before continuing. This performance hit is known as '''Lock Convoying'''.
[[File:DarthVector ThreadLock.PNG |thumb|center|400px600px| Performance issues inside STL]]
===Lock Convoying in TBB===
TBB attempts to mitigate the performance issue from parallel code when accessing or completing an operation on a container through its own containers such as concurrent_vector.
Through '''concurrent_vector''', every time an element is accessed/changed, a return of the index location is given. TBB promises that any time an element is pushed, it will always be in the same location, no matter if the size of the vector changes in memory. With a standard vector, when the size of the vector changes, the data is copied over. If any threads are currently traversing this vector when the size changes, any iterators may no longer be valid. This support also goes further for containers so that multiple threads can iterate through the container while another thread may be growing the container. An interesting catch though is that anything iterating may iterate over objects that are being constructed, ensuring construction and access remain synchronized.
 
[[File:Gpuconcur.PNG |thumb|center|600px| concurrent_vector use with multiple threads]]
TBB also provides its own versions of the mutex such as ''spin_mutex'' for when mutual exclusion is still required.
==Business Point of View Comparison for STL and TBB==
{| class="wikitable collapsible collapsed" style="text-align: left;margin:0px;"
''I find your lack of grain disturbing''
===Which library is better depending the on the use case?===
One major aspect to look for when parallelizing a piece of code is the Cost- Benefit. Is it worth the time and effort to parallelize a part of your software only to get a small performance gain? Or is just faster to keep in serial? Many of these questions must be answered when deciding to parallelize your code or not.
The real question is when should you parallelize your code, or to just keep it serial? TBB helps to lower the cost of smaller performance benefitis for multi-threading and STL is for single threading workloads. Due The fastest known serial algorithm maybe difficult or impossible to TBB requiring less effort to implementparallelize. Compared  Some aspects to other multi-threading librarieslook out for when parallelizing your code are; *Overhead whether it maybe in communication, idling, load imbalance, synchronization, and excess computation
*If are just trying to parallelize a section Efficiency which is the measure of your code with processor utilization in a simple map, scan, or reduce pattern. Without much thought TBB has you covered.*When working with large collections of data TBB with its use of block range coupled with it algorithms makes it simpler to come up with solutions for the collectionparallel program
TBB enables you to specify logical parallelism instead *Scalability the efficiency can be kept constant as the number of threads. It eliminates processing elements is increased, provided that the time needed when developing a backbone for your code when working with threads. For quick and easy solutions for parallelize your code TBB problem size is the way to go.increased
STL stands on its own right*Correct Problem Size, with its well-designed serial featureswhen testing for efficiency, it may show poor efficiency if the problem size is too small. When trying to have more control near hardware levelSo, or you need would want to work near hardware leveluse serial instead, if the STL library with the threading libraries problem size is always small. If you have a large problem size and has great efficiency, then parallel is the way to go. Though there are thread safety issues.
'''Conclusion'''
TBB only gives you parallel solutions, STL gives you the foundations for many serial algorithms for sorting, searching, and a verity of containersResource: http://ppomorsk.sharcnet.ca/Lecture_2_d_performance.pdf
===Implementation Safety for TBB Identifying the worries and STL responsibilities ===We are all human and we do make mistakes. Less mistakes done by developers will equal to less wasted time.
*TBB specifically makes concurrent_vector container not to support insert and erase operationsThe increasing complexity of your code is a natural problem when working in parallel. Knowing the responsibilities as in what you must worry about as a developer is key. Only new items can be pushed backWhen trying quickly implement parallel regions in your code, and cannot be shrunkor to just to keep your code serial.
** This prevents developers to write bad code. If for example, we would allow insert ====STL and erase operations on concurrent_vector, it could cause a big performance hit. This performance hit can burden both iterating and growing operations which will not only make the concurrent containers in TBB unless, but also your program inefficient. Threading Libraries====
*As already stated most of If you are going to try to parallelize your code using STL coupled with the STL containers are not thread safe. Though some operations in TBB containers are also not thread safe, like reserve() and clear() in concurrent_vector. threading libraries this is what you must worry:
*Thread Creation, terminating, and synchronizing, partitioning is managed , and management must be handled by TBByou. This creates a layer of safety on increases the work load and the programmer’s endcomplexity, has they do not have to deal with the threads themselves, making thread creation and overall resource for STL is managed by a developer less prone to make mistakes in their codecombination of libraries.
*Dividing collection of data is more of the problem when using the STL containers.
=== Identifying the worries and responsibilities ===The increasing complexity of your code is a natural problem when working in parallel. Knowing the responsibilities as in what you must worry about as a developer is key. When trying quickly implement parallel regions in your code, or to just to keep your code serial.====STL===='''If you are going to try to parallelize your code using STL coupled with the threading libraries this is what you must worry'''*Thread Creation, terminating, and synchronizing, partitioning, thread creation, and management must be handled by you. This increases the work load and the complexity. *The thread creation and overall resource for STL is managed by a combination of libraries.*Dividing collection of data is more of the problem when using the STL containers. Also mentioning that is not thread safe.*C++11 does not have any parallel algorithms. So, any common parallel patterns such as; map, scan, reduce, must be implemented by yourself, or by another library. But the latest STL Though C++17, will have some parallel algorithms like scan, map, and reduce.
The benefit of STL is due to the fact that you must manage the thread/ resources yourself which give you more control on the code, and fine tuning optimizations. Nonetheless, managing the thread yourself can be a double edge sword since with more control, it will take time implementing the code and the level of complexity will increase.
'''What you don’t need to worry about'''
*Making sorting, searching algorithms.
*Partitioning data.
*Array algorithms; like copying, assigning, and checking data
Note all algorithms is done in serial, and may not be thread safe
====TBBWorries and Responsibilities====*Thread Creation, terminating, and synchronizing, partitioning, thread creation, and management is managed by TBB. This make you need not to worry about the heavy constructs of threads which are close to the hardware level.
*Making a solution from close to hardware level allows Own Parallel algorithms (makes you need not to be flexible to worry about the solution you heavy constructs of threads that are wanting to make. But present in the major downside is the requirement lower levels of implementing the foundations first to make your solution workprogramming. It also simple map, scan, pipeline, or reduce TBB has the potential of making your program inefficient if not done correctly.you covered
TBB does have Parallel Algorithms support*Dividing collection of data, that has been already mentioned. the block range coupled with it algorithms makes it simpler to divide the data
'''Benefit'''
The downside of TBB is since much of the close to hard hardware management is done be hide the scenes, it makes you has a developer have less control on finetuning your program. Unlike how STL with the threading library allows you to do.
 
 
===Licensing===
TBB is dual-licensed as of September 2016
 
*COM license as part of suites products. Offers one year of technical support and products updates
 
*Apache v2.0 license for Open source code. Allows the user of the software the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software, under the terms of the license, without concern for royalties.
 
 
===Companies and Products that uses TBB===
*DreamWorks (DreamWorks Fur Shader)
 
*Blue Sky Studios (animation and simulation software)
 
*Pacific Northwest National Laboratory (Ultrasound products)
 
*More: https://software.intel.com/en-us/intel-tbb/reviews
32
edits