Changes

Jump to: navigation, search

GPU621/Code

676 bytes added, 23:27, 27 November 2018
Intel VTune™ Amplifier
==='''IntroductionAbout VTune'''===
Intel VTune amplifier is a analysis software that allows you the ability to measure performance of your serial or multithreaded program. VTune allows you to analyze the performance of your algorithms and multithreading. It can help with debugging threads by calculating overhead, finding bottlenecks or inefficiencies.
==='''Create a project'''===VTune allows you to analyze the performance of your algorithms and multithreading. It can help with debugging threads by calculating overhead, finding bottlenecks or inefficiencies.
I will be explaining how to use VTune part Intel® Parallel Studio XE 2019 of the alongside Visual Studio 2017. Intel® Parallel Studio XE can be found here: https://software.intel.com/en-us/parallel-studio-xe.Once installed you will be able to find VTune in the tools tab inside ran as a stand alone program or Integrated with Visual Studio. [[File:tools_tab.PNG | 400px]]
 ==='''Configure a projectStarting VTune'''===
When you hover over Intel VTune Amplifier 2019 in the tool’s menu. You will see more options appear.
 Select the '''Select configure Analysis.'''option.
[[File:options.PNG | 400px]]
The following screen will To see the different analysis's available to be displayedran with Vtune. Click on the three little dots circled in the below picture. To expand more options for running VTune.
[[File:SetupOptions.PNG | 600px]]
This menu will appear, it contains different tests that you can run against your program. The I am going to look through is the default test Hotspots. Depending on your program you may want too to look into the other options.
This menu will appear, it contains different tests that you can run against your program. [[File:startup options.PNG | 600px400px]]  ==='''Demo'''===
==='''How it works'''===
On the main page we can run the test, by clicking on the blue play button.
[[File:start testThe following is some code from the Matrix Multiplication exercise we did in lab3. It contains 2 versions of the “matMul” function.PNG | 600px]]
When Change the value of the version macro inside "MatMul.cpp" to run the different versions. *Version 1 – the matrix multiplication logic has been put inside a parallel for statement *Version 2 – the matrix multiplication logic is still inside the parallel for statement, but it is being dynamically scheduled and certain variables are selected to be private or shared. [https://github.com/coreyjjames/CoreyJJames/tree/Lab3_VTune_Example Example Code]  To run the test completesexample code, copy the code into Visual Studio and Build it. - Run the program with VTune threading analysis. - The point of interest in the program is under the summary page platform tab. You will be displayednotice in version 1 some of the threads finish before other's. The work is not being spread evenly. This will outline  - In version 2, that issue is resolved all the results from threads end at the testsame time. When I ran Version 2 I saw around a 0.6s increase in performance. Important notes:*Run in Release X64, using OpenMP and Intel compiler.*Turn off optimization so you can see source code Hotspot's*Rebuild after any changes.
[[File:test complete.PNG | 600px]]
==='''Interpreting results'''===
The following picture is the different tabs available from a hotspot analysis.
[[File:tabs.PNG | 600px]]
Determining the results from VTune will be a different process for your program then mine.
 
To be success full make sure to read through the results and look for anomalies.
 
'''Example of anomalies:'''
*Poor utilization of all the available threads.
*Uneven distribution of the work across the threads.
*High spin or overhead time.
*thread's waiting for no reason.
*Hotspots in the code
 
'''VTune navigation bar (Depending on the Analysis):'''
*Analysis configuration
**Main configuration page for VTune
**Logs from the analysis
*Summary
**Elapsed Time: this is the amount of time your program took to run***The CPU time: displays the effective, spin and overhead times.**Top Hotspots: Displays the area’s that were most active in your program.**Effective CPU Utilization Histogram: This shows the time your program spent using x number of threads. The graph shows x axis is the moments that your program was a certain number of threads. And the y axis is the time that your program used that number of threads for.**Collection and Platform Info: this display’s all the hardware Display's relevant information about the computer the test was run on.analysis
*Bottom-up
**Allows you to se the call stack of a function starting from the first call.
**Displays the time and the utilization of each thread.
 
Note:
*When reviewing results pay attention to any red flags that are displayed beside results in VTune. If you hover over them, it will give you more information.
*Almost everything that is in the color red means their is a potential problem.
*OpenMP integration, VTune does have native support for OpenMP some results are generated by VTune specifically for OpenMP.
----
50
edits

Navigation menu