Open main menu

CDOT Wiki β

J&J

Revision as of 04:51, 11 December 2016 by Jvaraujo (talk | contribs) (Intel Threading building blocks)

Introduction to Intel Threading Building Blocks

Why Use It:Intel® Threading Building Blocks (Intel® TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.

What is it:Widely used C++ template library for task parallelism.

Primary Features: - Parallel algorithms and data structures and Scalable memory allocation and task scheduling.

Reason to Use: - Rich feature set for general purpose parallelism. C++; Windows*, Linux*, OS X* and other OSes

Key Benefits of Using Intel TBB

Intel TBB differs from typical threading packages in the following ways:

Enables you to specify logical parallelism instead of threads.

Intel TBB has a runtime library that automatically maps logical parallelism onto threads in a way that makes efficient use of processor resources―thereby making it less tedious and more efficient.

Targets threading for performance.

Intel TBB focuses on the particular goal of parallelizing computationally intensive work, delivering higher-level, simpler solutions

Compatible with other threading packages.

Intel TBB can coexist seamlessly with other threading packages, giving you the flexibility to not touch your legacy code but still use Intel TBB for new implementations.

Emphasizes scalable, data parallel programming.

Intel TBB emphasizes data-parallel programming, enabling multiple threads to work on different parts of a collection. Data-parallel programming scales well to larger numbers of processors by dividing the collection into smaller pieces. With data-parallel programming, program performance increases as you add processors.

Relies on generic programming.

Intel TBB uses generic programming. The essence of generic programming is writing the best possible algorithms with the fewest constraints. The C++ Standard Template Library (STL) is a good example of generic programming in which the interfaces are specified by requirements on types.



Intel® Threading Building Blocks (Intel® TBB) makes parallel performance and scalability easily accessible to software developers who are writing loop and task based applications. Developers can build robust applications that abstract platform details and threading mechanisms while achieving performance that scales with increasing core count.

Rich Feature Set for Parallelism

Intel TBB includes a rich set of components for threading performance and productivity.

Parallel algorithms and data structures

Generic Parallel Algorithms

An efficient, scalable way to exploit the power of multi-core without having to start from scratch.

Flow Graph

A set of classes to express parallelism as a graph of compute dependencies and/or data flow.

Concurrent Containers

Concurrent access, and a scalable alternative to containers. that are externally locked for thread-safety.

Memory allocation and task scheduling

Task Scheduler

Sophisticated work scheduling engine that empowers parallel algorithms and the flow graph.

Memory Allocation

Scalable memory manager and false-sharing free allocators

Threads and synchronization

Synchronization Primitives

Atomic operations, a variety of mutexes with different properties, condition variables

Timers and Exceptions

Thread-safe timers and exception classes

Threads

OS API wrappers

Thread Local Storage

Efficient implementation for an unlimited number of thread-local variables.

Conditional Numerical Reproducibility

Ensure deterministic associativity for floating-point arithmetic results with the new Intel TBB template function ‘parallel_deterministic_reduce’.


Supports C++11 Lambda

Intel TBB can be used with C++11 compilers and supports lambda expressions. For developers using parallel algorithms, lambda expressions reduce the time and code needed by removing the requirement for separate objects or classes. Flow Graph Designer

Computing systems are becoming increasingly heterogeneous. And developing for heterogeneous computing systems can often be challenging because of divergent programming models and tools. Intel TBB flow graph provides a single interface that enables intra-node distributed memory programming, thereby simplifying communication and load balancing across heterogeneous devices.

It does this in two ways:

1. As an analyzer, it provides capabilities to collect and visualize execution traces from Intel TBB flow graph applications. From Flow Graph Designer, users can explore the topology of their graphs, interact with a timeline of node executions, and project important statistics on to the nodes of their graphs. 2. As a designer, it provides the ability to visually create Intel TBB flow graph diagrams and then generate C++ stubs as a starting point for further development.






















Overview - Intel Threading Building Blocks (IntelTBB) is a C++ library that simplifies threading for performance - Move the level at which you program from threads to tasks - Let the run-time library worry about how many threads to use, scheduling, cache etc. - Committed to: compiler independence, processor independence, OS independence

Benefits of TBB - Intel Threading Building Blocks enables you to specify a task instead of threads - Intel Threading Building Blocks targets threading performance - Intel Threading Building Blocks is compatible with other threading packages - Intel Threading Building Blocks emphasizes scalable data parallel programming - Intel Threading Building Blocks relies on generic programming

TBB is a collection of components for parallel programming: - Basic algorithms: parallel_for, parallel_reduce, parallel_scan - Advanced algorithms: parallel_while, parallel_do, parallel_pipeline, parallel_sort - Containers: concurrent_queue, concurrent_priority_queue, concurrent_vector, concurrent_hash_map - Memory allocation: scalable_malloc, scalable_free, scalable_realloc, scalable_calloc, scalable_allocator, cache_aligned_allocator - Mutual exclusion: mutex, spin_mutex, queuing_mutex, spin_rw_mutex, queuing_rw_mutex, recursive_mutex - Atomic operations: fetch_and_add, fetch_and_increment, fetch_and_decrement, compare_and_swap, fetch_and_store