Changes

← Older edit

GPU621/Intel Parallel Studio VTune Amplifier

346 bytes added, 09:11, 9 December 2021

→‎Conclusion

For more information on System Overview click [https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance/platform-analysis-group/system-overview-analysis.html here]

===~~'''~~Versions of the software:~~'''~~===

*Standalone VTune Profiler Graphical Interface

*Web Server Interface

====Performance====

As can be seen from the screenshot below, there is a lot of overhead due to tbb::parallel_scan scheduling. Additionally, it seems that most work is done by thread 1, which can be explained by the fact that the array is still initialized serially. The solution can be optimized by choosing the proper grain size which is the first suggestion Vtune gave.

[[File:TBB_Scan.png]]

Ikondrakov

70

edits

Changes

GPU621/Intel Parallel Studio VTune Amplifier

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools