Changes

Jump to: navigation, search

GPU621/Intel Parallel Studio VTune Amplifier

310 bytes added, 23:23, 8 December 2021
Performance
====Performance====
As can be seen from the screenshot below, there is a lot of overhead due to tbb::parallel_scan scheduling. Additionally, it seems that most work is done by thread 1, which can be explained by the fact that the array is still initialized serially. The solution can be optimized by choosing the proper grain size which is the first suggestion Vtune gave.
[[File:TBB_Scan.png]]
70
edits

Navigation menu