GPU621/Intel Parallel Studio VTune Amplifier

From CDOT Wiki
Revision as of 18:21, 8 December 2021 by Sgriffis (talk | contribs) (Features & Functionalities)
Jump to: navigation, search

Group Members

  1. Iurii Kondrakov
  2. James Mai
  3. Stephen Griffis
  4. Email All

Introduction

This page will demonstrate the purpose, features, and usage of the Intel Parallel Studio VTune Profiler. In short, it is a performance analyzing tool for 32-bit, 64-bit, and x86 architectures running Linux-based or Microsoft Windows operating systems. It supports both Intel and AMD hardware, but advanced hardware-based features and acceleration require an Intel-manufactured CPU. Furthermore, the application is specifically designed to help with optimization of application and system performance, High-performance computing configurations, and more by detecting performance bottlenecks through run-time environment profiling on serial and multi-threaded software.

Details

Intel Parallel Studio VTune Amplifier is available as a standalone application or as part of the Intel Parallel Studio bundle. The tool analyzes the run-time environment metrics of the target executable such as CPU utilization, Throttling, Thread usage, Memory usage, I/O, and other system-level resources to provide a developer with an extensive and precise report and suggestions for possible improvements. Additionally, the application measures the performance of a program, and each function separately and displays the report in an integrated Visual Studio user-friendly interface with comprehensive graphs and tables. Furthermore, the tool can perform an overview to help a developer identify any bottlenecks to resolve and suggest little tweaks such as increasing the grain size or the scope of the data when using the TBB library that Intel provides. Overall, this is a powerful and reliable tool for scalable parallelization.

Our team will point the usage and importance of the tool by demonstrating a little case study with the code implemented during the semester. Precisely, the Workshop 7 - Threading Building Blocks(parallel_scan) which is about a trivial Prefix Scan Problem. We will compare the performance and resource utilization of Serial, OpenMP and TBB versions each being analyzed by Intel Parallel Studio VTune Amplifier in Visual Studio.

Features & Functionalities

Performance Snapshot

This feature will take all the different types of analysis and provide you a summary of each analysis and show how the performance is for each one, while highlighting the ones with the worst performances. This allows users to easily pinpoint which sections need to be prioritized and which sections may require more time to resolve.

Performance Snapshot.png

Algorithm

Hotspots

This feature will User-Mode Sampling or Hardware Event Based Sampling to collect data while your application is running. After the data collection is completed, it will display where in the process does the code stall or take the most time running and how well you’re utilizing your CPU threads.

Hotspot.png

You can also open the source code and display which functions are taking up the most CPU time. Which allows you to pinpoint where you should start on optimizing your code and allows you to focus on the functions that are causing the most run-time delay.

Hotspot2.png

User-Mode Sampling

VTune uses a low overhead (~5%) sampling and tracing collection that works to get the information needed without slowing down the application significantly. The data collector uses the OS timer to profile the application, collects samples of all active instruction addresses in intervals of 10ms, and captures a call sequence. Once everything has been collected, it will display the results of the data collection in the results tab.

Hardware Event-Based Sampling=

VTune will analyze not just the application running, but all processes running on your system at the moment of run-time and will provide CPU run time performance on the system as a whole. It will still create a list of functions that run in the current application while timing them, but it won't capture the call sequences as hotspots.


For more information on Hotspots click here

Anomaly Detection Analysis

Versions of the software:

  • Standalone VTune Profiler Graphical Interface
  • Web Server Interface
  • Microsoft Visual Studio Integration
  • Eclipse IDE Integration
  • Intel System Studio IDE Integration

How to configure and Start Analysis

Intel Parallel Studio VTune Profiler has a couple of versions and a couple of IDE integrations, but for the sake of simplicity and relevance to the course material, we are going to use Visual Studio Integration of the Profiler. To begin, the IDE must be run with root privileges for the Profiler to have access to the hardware information and resources, otherwise, the analysis and collected data will be limited.

To open Vtune press the button in the upper toolkit as shown below:

Menu.jpg

Next, Configure Analysis button needed to be pressed to get to the VTune Main Menu:

Main Menu.jpg

Where a target program and host where it will be executed can be chosen. Also, program parameters and working directories can be configured. Finally, the method of analysis and detection with different options can be enabled from the "How" menu:

How Menu.jpg

Finally, the start button, as shown above, can be pressed to start the actual Vtune configured analysis.

Demonstration

Sources

  1. Intel VTune Profile
  2. Installation and Features
  3. GPU621 Workshop 7
  4. Github repository of the code used