Open main menu

CDOT Wiki β

GPU621 Team Wired

OpenMP Profiler

Team Members

  1. Dylan Segna

Setup

Download

Download the current release here: http://www.ompp-tool.com/downloads.html

The user manual is also available from this downloads page.

Environment

The operating system used was Ubuntu version 16.04.1 running within a virtual machine.

The compiler used was GCC, which came installed with the Ubuntu operating system. However, OpenMP profiler can be used with any compiler that supports OpenMP.

Installation

Extract the files from the archive downloaded from the ompp downloads page. The location they are extracted to does not matter, as they are temporary and can be deleted after the installation is complete.

Open “Makefile.defs” with a text editor of your choice.

● Fill in the INSTDIR variable with the path to the folder you want to install ompp in. For example, mine was:

INSTDIR=/home/dylan/Desktop/ompp

● Set the OpenMP C/C++ compiler variable to the compiler you want to use to build ompp. If you are using GCC then it will look like this:

OMPCC=gcc

● Lastly, set the flag for your chosen compiler which enable OpenMP compilation. For GCC this flag was:

OMPFLAG=-fopenmp

After making these changes, you are ready to build ompp.

Run “make” and “make install”. If there were no errors, the folder which you specified for INSTDIR should be populated by the ompp build output files.

The last step is to add ompp to your path. Run

export PATH=$PATH:{install directory}/bin

where {install directory} is the path you specified for INSTDIR. For example: “export PATH=$PATH:/home/dylan/Desktop/ompp/bin”


Compiling and Running an OpenMP program

Now that ompp is installed and added to your path, you can proceed to compile and execute an OpenMP program.

To compile with ompp enabled, append “kinst-ompp” to your compilation line. For example, when compiling a c++ program with GCC the compilation command would look like this:

kinst-ompp g++ -fopenmp -std=c++11 source.cpp -o output

If there were no compilation errors you will see a number of intermediary files get created. You may ignore these and run the output program.

After the program has finished running, a text file will be generated with a name formatted as “outputfile.number-number.ompp.txt”. Each new program execution will generate a new text file with an ascending number, so previous files are never overwritten.

Open these files to view the ompp report for that execution.


Interpreting the Report

General Information

This section gives you all the general information pertaining to the executed program. An example of the general information section:

----------------------------------------------------------------------
----     ompP General Information     --------------------------------
----------------------------------------------------------------------
Start Date      : Thu Dec 01 05:41:19 2016
End Date        : Thu Dec 01 05:41:31 2016
Duration        : 11.37 sec
Application Name: unknown
Type of Report  : final
User Time       : 9.86 sec
System Time     : 0.00 sec
Max Threads     : 4
ompP Version    : 0.8.99
ompP Build Date : Dec  1 2016 05:27:34
PAPI Support    : not available

Region Overview

The region overview section outlines all OpenMP regions in the program, including what type of region they are and in which file and at what line numbers they are found. An example of the region overview section is the following:

----------------------------------------------------------------------
----     ompP Region Overview     ------------------------------------
----------------------------------------------------------------------
PARALLEL: 1 region:
 * R00001 omp_sync.cpp (34-47) 
CRITICAL: 1 region:
* R00002 omp_sync.cpp (45-46) (unnamed)
----------------------------------------------------------------------

This outlines two OpenMP regions, one being a parallel region and the other being a critical region, where only one thread may execute at a time.

Callgraph

This section gives a good general overview regarding how much execution time each region is taking up. For example:

----------------------------------------------------------------------
----     ompP Callgraph     ------------------------------------------
----------------------------------------------------------------------
 Inclusive  (%)   Exclusive  (%)
  0.17 (100.0%)    0.00 ( 0.26%)           [unknown: 8 threads]
  0.17 (99.74%)    0.17 (99.73%) PARALLEL  +-R00001 omp_sync.cpp (35-48)
  0.00 (0.001%)    0.00 (0.001%) CRITICAL     +-R00002 omp_sync.cpp (46-47) (unnamed)
----------------------------------------------------------------------

In this callgraph section you can see that the main took up 0.26% of the execution time, the parallel region from lines 35-48 took up 99.73% of the execution time and the critical region only took up 0.001% of the execution time.

Region Profile

These sections give detailed information on each region as well as detailed information for each thread.

----------------------------------------------------------------------
----     ompP Flat Region Profile (inclusive data)     ---------------
----------------------------------------------------------------------
R00001 omp_sync.cpp (35-48) PARALLEL
TID      execT      execC      bodyT   exitBarT   startupT   shutdwnT      taskT
  0       0.95      1          0.95    0.00       0.00       0.00          0.00
  1       0.95      1          0.95    0.00       0.00       0.00          0.00
  2       0.95      1          0.91    0.02       0.03       0.00          0.00
  3       0.95      1          0.93    0.00       0.03       0.00          0.00
  4       0.95      1          0.93    0.00       0.02       0.00          0.00
  5       0.95      1          0.93    0.01       0.01       0.00          0.00
  6       0.95      1          0.94    0.01       0.01       0.00          0.00
  7       0.95      1          0.94    0.00       0.01       0.00          0.00
SUM       7.63      8          7.49    0.04       0.10       0.00          0.00
----------------------------------------------------------------------

From left to right you have the thread ids, the execution time, execution count, body time, exit barrier time, startup time, shutdown time and task time. Each column is summed in the last row to show cumulative time/counts for all threads.

Overhead Analysis Report

This section gives details for the various overhead times incurred from parallel regions. For example, this overhead section is from a program execution with a large number of elements.

----------------------------------------------------------------------
----     ompP Overhead Analysis Report     ---------------------------
----------------------------------------------------------------------
Total runtime (wallclock)   : 0.95 sec [8 threads]
Number of parallel regions  : 1
Parallel coverage           : 0.95 sec (99.96%)
Parallel regions sorted by wallclock time:
           Type                            Location      Wallclock (%) 
R00001  PARALLEL                omp_sync.cpp (35-48)       0.95 (99.96) 
                                                SUM       0.95 (99.96) 
Overheads wrt. each individual parallel region:
         Total        Ovhds (%)  =   Synch  (%)  +  Imbal   (%)  +   Limpar (%)   +    Mgmt (%)
R00001     7.63     0.14 ( 1.85)    0.00 ( 0.00)    0.04 ( 0.51)    0.00 ( 0.00)    0.10 ( 1.34)
Overheads wrt. whole program:
         Total        Ovhds (%)  =   Synch  (%)  +  Imbal   (%)  +   Limpar (%)   +    Mgmt (%)
R00001     7.63     0.14 ( 1.85)    0.00 ( 0.00)    0.04 ( 0.51)    0.00 ( 0.00)    0.10 ( 1.34)
  SUM     7.63     0.14 ( 1.85)    0.00 ( 0.00)    0.04 ( 0.51)    0.00 ( 0.00)    0.10 ( 1.34)
----------------------------------------------------------------------

From left to right: synchronization, imbalance, limited parallelism and thread management

You can see that the overhead for the parallel region is very minimal, taking up only 1.85% of the program’s total execution time. By maintaining proper workload balance between threads and providing enough work to warrant parallelization, the overhead percentages can be kept low.  

To contrast, the following is the overhead analysis generated for an execution with a much smaller number of elements:

----------------------------------------------------------------------
----     ompP Overhead Analysis Report     ---------------------------
----------------------------------------------------------------------
Total runtime (wallclock)   : 0.17 sec [8 threads]
Number of parallel regions  : 1
Parallel coverage           : 0.17 sec (99.74%)
Parallel regions sorted by wallclock time:
           Type                            Location      Wallclock (%) 
R00001  PARALLEL                omp_sync.cpp (35-48)       0.17 (99.74) 
                                                 SUM       0.17 (99.74) 
Overheads wrt. each individual parallel region:
         Total        Ovhds (%)  =   Synch  (%)  +  Imbal   (%)  +   Limpar (%)   +    Mgmt (%)
R00001     1.37     0.18 (13.24)    0.00 ( 0.00)    0.07 ( 5.25)    0.00 ( 0.00)    0.11 ( 7.99)
Overheads wrt. whole program:
         Total        Ovhds (%)  =   Synch  (%)  +  Imbal   (%)  +   Limpar (%)   +    Mgmt (%)
R00001     1.37     0.18 (13.21)    0.00 ( 0.00)    0.07 ( 5.23)    0.00 ( 0.00)    0.11 ( 7.97)
  SUM     1.37     0.18 (13.21)    0.00 ( 0.00)    0.07 ( 5.23)    0.00 ( 0.00)    0.11 ( 7.97)
----------------------------------------------------------------------

The time spent due to imbalance went from 0.04 to 0.07, and thread management also increased a small amount. However, the total overhead percentage has increased from only 1.85% to 13.21%. This may indicate a situation where parallelization may not be worth the overhead costs, or that you may benefit from reducing the number of threads to allow better workload balancing.