Changes

Jump to: navigation, search

DPS921/Intel Advisor

780 bytes added, 18:57, 7 December 2020
Roof-line Analysis
= Intel Advisor =
Intel Advisor is a set of design and analysis tool for optimizing performance. This is done by using the Advisor tools used that will help to measure analyze and improve threading, vectorization, and memory use within the performance of an application. They Advisor supports C, C++, Fortran, Python, and OpenMP. The Tools highlighted in this wiki page are Vectorization advisorthe survey analysis, Rooflinedependencies analysis, Threadingroof-line analysis, Off loadingand memory access pattern analysis. Survey Analysis will identify points in your code where vectorization or parallelization is possible and improvement to make the execution faster. Dependencies Analysis will identify data Dependencies in your code. Roofline Analysis will find performance headroom against hardware limitations and get insights for an effective optimization roadmap. Memory Access Pattern Analysis can check for various memory issues, such as non-contiguous memory accesses and flow Graphunit strides.
== Group Members ==
= Roof-line Analysis =
The roofline tool creates a tool line model, to represent an application's performance in relation to hardware limitations, including memory bandwidth and computational peaks. To measure performance we use 2 axes with GFLOPs (Giga Flops/secFloating point operations per second) on the y-axis, and AI(Arithmetic Intensity(FLOPs/Byte)) on the x-axis both in log scale, with this we can begin to build our roof-line. Now for any given machine, its CPU can only perform so many FLOPs so we can plot the CPU cap on our chart to represent this. Like the CPU a memory system can only supply so many gigabytes, we can represent this by a diagonal line(N GB/s * X FLOPs/Byte = Y GFLOPs/s). (pic) This chart represents the machine's hardware limitation, and it's best performance at a given AI
Every function, or loop, will have specific AI, when ran we can record its GFLOPs , Because we know Its AI won't change and any optimization we do will only change the performance, this is useful when we want to measure the performance of a given change or optimization.
[[File:Flops.PNG]]
<source>#include <iostream>= How to set up a Roof-line Analysis =#include <iomanip>Microsoft Visual Studio Integration#include <cstdlib>#include <chrono>1. select project#include <omp.h>using namespace std[[File::chrono;#define NUM_THREADS Step_1.1// report system time//void reportTime(const char* msg, steady_clock::duration span) { auto ms = duration_cast<milliseconds>(span); std::cout << msg << " - took - " << ms.count() << " milliseconds" << std::endl;}PNG]]
int main(int argc, char** argv) { if (argc != 2) { std::cerr << argv[0] << ": invalid number of arguments\n"; std::cerr << "Usage: " << argv[0] << " no_of_slices\n"; return 1; } int n = std::atoi(argv[1]); int* t; steady_clock::time_point ts, te;. Go to Intel advisor and select roof-line tool
// calculate pi by integrating the area under 1/(1 + x^2) in n steps ts = steady_clock[[File::now(); int mt = omp_get_num_threads(), nthreads; double pi; double stepSize = 1Step_2.0 / (double)n; omp_set_num_threads(NUM_THREADS); t = new int[3png]]; #pragma omp parallel { int i, tid, nt; double x, sum; tid = omp_get_thread_num(); nt = omp_get_num_threads(); if (tid == 0) nthreads = nt; for ( i = tid, sum=0.0; i<n; i+=nt) { x = ((double)i + 0.5) * stepSize; sum += 1.0 / (1.0 + x * x); } #pragma omp critical pi += 4.0 * sum * stepSize; }
te = steady_clock::now();3. let roof-line tool analyze data
std [[File::cout << "n = " << n <<" " << nthreads << std::fixed << std::setprecision(15) << "\n pi(exact) = " << 3Step_3.141592653589793 << "\n pi(calcd) = " << pi << std::endl; reportTime("Integration", te - ts);}PNG]]
4. review data
</source>[[File:Step_4.PNG]]
= Memory Access Pattern Analysis =
We can use the MAP analysis tool to check for various memory issues, such as non-contiguous memory accesses and unit strides. Also we can get information about types of memory access in selected loops/functions, how you traverse your data, and how it affects your vector efficiency and cache bandwidth usage.<source>#include <iostream>using namespace std;= How to set up Memory Access Pattern Analysis =step 1 run roof-line tools [[File:Step_4.PNG]]
const long int SIZE = 3500000;Step 2 run Map tool
typedef struct tricky{ int member1; float member2;} tricky;[[File:Step_5.PNG]]
tricky structArray[SIZE];Step 3 Review data
int main(){ cout << "Starting[[File:Step_6.\n"; for (long int i = 0; i < SIZE; i++) { structArray[iPNG]].member1 = (i / 25) + i - 78; } cout << "Done.\n"; return EXIT_SUCCESS;}
<source>/* Copyright (C) 2010-2017 Intel Corporation. All Rights Reserved.
*
* The source code, information and material ("Material")
* contained herein is owned by Intel Corporation or its
* suppliers or licensors, and title to such Material remains
* with Intel Corporation or its suppliers or licensors.
* The Material contains proprietary information of Intel or
* its suppliers and licensors. The Material is protected by
* worldwide copyright laws and treaty provisions.
* No part of the Material may be used, copied, reproduced,
* modified, published, uploaded, posted, transmitted, distributed
* or disclosed in any way without Intel's prior express written
* permission. No license under any patent, copyright or other
* intellectual property rights in the Material is granted to or
* conferred upon you, either expressly, by implication, inducement,
* estoppel or otherwise. Any license under such intellectual
* property rights must be express and approved by Intel in writing.
* Third Party trademarks are the property of their respective owners.
* Unless otherwise agreed by Intel in writing, you may not remove
* or alter this notice or any other notice embedded in Materials
* by Intel or Intel's suppliers or licensors in any way.
* This file is intended for use with the "Memory Access 101" tutorial.
*/
#include <iostream>
#include <time.h>
return EXIT_SUCCESS;
}</source> 
</source>
= Sources =
62
edits

Navigation menu