25
edits
Changes
→Initial Performance
Below, you can see the analysis provided by the VTune Amplifier, through visual studio. Here we can see the total elapsed time as well as the overhead time that the program used:
[[File:initial_analysis.png|750px|centerleft|Initial performance ]]
[[File:initial_histo_analysis.png|750px|centerleft|First histogram analysis]]
[[File:initial_analysis_parallel_vs_serial.png|750px|centerleft|Initial comparison between parallel and serial implementations]]
What these graphs and figures are telling us is that we've efficiently parallelized the lowest of the low hanging fruit. The parallel regions themselves are reasonably efficient at what they do, but as it turns out, they contributed very, very little to the computational load in the first place. A lot of CPU time is spent at synchronization barriers, and there is a major hotspot which can still be parallelized.