30
edits
Changes
no edit summary
As you can see the execution time increase with the number of threads. These results are not what you would expect but there are 2 reasons that may have caused this. The first is that the overhead for creating and maintaining the threads is overwhelming larger than the contents of the for loop. The second is False sharing.
=Eliminating False Sharing=
Wasting memory to put your data on different cache lines is not ideal solution to the False Sharing problem even though it works. Using local variables, instead of contiguous array locations, the writes to memory will be spread out to different cache lines. Another benefit to this approach is that you do not have multiple threads writing to the same cache line, invalidating the data and bottlenecking the processes.
= Intel VTune Amplifier =
VTune Ampllifier is a trace based analysis tool used for deep analysis of a given program's runtime. Modern processors nowadays require much more than just optimizing single thread performance. High performing code must be:
* '''Threaded and scalable''' to utilize multiple CPUs
* '''Vectorized''' for efficient use of multiple FPUs
* '''Tuned''' to take advantage of non-uniform memory architectures and caches
Intel VTune Amplifier's single, user friendly analysis interface provides all these advanced profiling capabilities.
== Some Key tools of VTune Amplifier ==
* '''HotSpot Analysis''': Hotspot analysis quickly identifies the lines of code/functions that are taking up a lot of CPU time.
* '''High-performance computing (HPC) Analysis''': HPC analysis gives a fast overview of three critical metrics;
** CPU utilizations (for both thread and MPI parallelism)
**Memory access
** FPU utilization(FLOPS)
[[File:HPC-1.png|frame|center]]
* '''Locks and Waits''': VTune Amplifier makes it easy to understand multithreading concepts since it has a built-in understanding of parallel programming. Locks and waits analysis allows you to quickly find he common causes of slow threaded code.
* '''Easier, More Effective OpenMP* and MPI multirank Tuning":
** The summary report quickly gets you top four answers you need to effectively improve openMP* performance.
[[File:OpenMP-5.png|frame|center]]
* VTune Amplifier provides hardware-based profiling to help analyze your code's efficient use of the microprocessor
<br>
This is just a brief summary of some of the tools available within VTune Amplifier. For more details, please visit [https://software.intel.com/en-us/intel-vtune-amplifier-xe Intel VTune Amplifier website].