Changes

Jump to: navigation, search

GPU621/VTuners

2,821 bytes added, 22:56, 5 December 2022
Parallelism
== Parallelism ==
 
By evaluating compute-intense or throughput high-performance computing (HPC) applications for CPU efficiency, vectorization, and memory allocation, the parallelism feature enables users to check how efficient their threaded code is and can identify the thread issues that affect performance. The terms explained below are the most common statistics, in an advanced version, algorithm-specific analysis may be available, (see Method for OpenMP Code Analysis and Schedule Overhead in Intel® oneAPI Threading Building Blocks Applications)
 
 
{| class="wikitable"
| Main Analysis Features || Threading, HPC Performance Characterization
|-
| Suggested Intel Compiler Version || Intel Composer XE 2013 Update 2 or higher (for CPU utilization analysis)
|-
| Parallelism Pattern || OpenMP, OpenMP-MPI, TBB
|}
 
[[File:Vtune Roadmap.png|400px|frame]]
 
 
'''Total Thread Count''': This section indicates the number of threads used when running the application. The term Thread Oversubscription indicates time spent in the code with the number of simultaneously working threads more than the number of available logical cores on the system.
 
Wait Time with poor CPU Utilization The value is the accumulated wait time of each thread where APIs blocks or cause synchronization. Therefore, this value can be higher than the application's Elapsed Time.
 
'''Top waiting objects'''': the Top Waiting Object section provides a table listing object names that took most time waiting in the application. Reasons for waiting could be function calls or synchronization. The higher wait time the more reductions of parallelism.
 
 
[[File:Effective-gpu.png|500px]]
 
=== Spin and Overhead Time ===
 
Spin time is the Wait time occurred when the CPU is busy. This often happens when a synchronization API causes the CPU to poll while the software thread is waiting. Overhead time is CPU time spent on the overhead of known synchronization and threading libraries, such as system synchronization APIs, Intel TBB, and OpenMP. This section lists the top functions in the application with the most spin and overhead time.
Bottom-Up Tab
 
=== The Bottom-up Tab ===
 
enables us to investigate the concurrency problems in the application and time-dependent the performance of each thread. In the figure below in the lower half part of the window is the timeline view. As shown in brown colour which indicates the CPU time. Not until ~12 second, the mater thread was split into 8 threads and the first five were off-loaded, while the last threes (TID: 14500, 16268, 28576) were waiting (shown in light green colour) and the last two even waited all the way end which weakened parallelism. When brown band (CPU Time) concurrently happened to multiple threads, it means high level of parallelism.
 
[[File:Effective-CPU-Utilization-Histogram.png|500px]]
== Platform and I/O ==
117
edits

Navigation menu