GPU621 Team Wired
Contents
OpenMP Profiler
Team Members
Setup
Download
Download the current release here: http://www.ompp-tool.com/downloads.html
The user manual is also available from this downloads page.
Environment
The operating system used was Ubuntu version 16.04.1 running within a virtual machine.
The compiler used was GCC, which came installed with the Ubuntu operating system. However, OpenMP profiler can be used with any compiler that supports OpenMP.
Installation
Extract the files from the archive downloaded from the ompp downloads page. The location they are extracted to does not matter, as they are temporary and can be deleted after the installation is complete.
Open “Makefile.defs” with a text editor of your choice.
● Fill in the INSTDIR variable with the path to the folder you want to install ompp in. For example, mine was:
INSTDIR=/home/dylan/Desktop/ompp
● Set the OpenMP C/C++ compiler variable to the compiler you want to use to build ompp. If you are using GCC then it will look like this:
OMPCC=gcc
● Lastly, set the flag for your chosen compiler which enable OpenMP compilation. For GCC this flag was:
OMPFLAG=-fopenmp
After making these changes, you are ready to build ompp.
Run “make” and “make install”. If there were no errors, the folder which you specified for INSTDIR should be populated by the ompp build output files.
The last step is to add ompp to your path. Run
export PATH=$PATH:{install directory}/bin
where {install directory} is the path you specified for INSTDIR. For example: “export PATH=$PATH:/home/dylan/Desktop/ompp/bin”
Compiling and Running an OpenMP program
Now that ompp is installed and added to your path, you can proceed to compile and execute an OpenMP program.
To compile with ompp enabled, append “kinst-ompp” to your compilation line. For example, when compiling a c++ program with GCC the compilation command would look like this:
kinst-ompp g++ -fopenmp -std=c++11 source.cpp -o output
If there were no compilation errors you will see a number of intermediary files get created. You may ignore these and run the output program.
After the program has finished running, a text file will be generated with a name formatted as “outputfile.number-number.ompp.txt”. Each new program execution will generate a new text file with an ascending number, so previous files are never overwritten.
Open these files to view the ompp report for that execution.
Interpreting the Report
General Information
This section gives you all the general information pertaining to the executed program. An example of the general information section:
---------------------------------------------------------------------- ---- ompP General Information -------------------------------- ---------------------------------------------------------------------- Start Date : Thu Dec 01 05:41:19 2016 End Date : Thu Dec 01 05:41:31 2016 Duration : 11.37 sec Application Name: unknown Type of Report : final User Time : 9.86 sec System Time : 0.00 sec Max Threads : 4 ompP Version : 0.8.99 ompP Build Date : Dec 1 2016 05:27:34 PAPI Support : not available
Region Overview
The region overview section outlines all OpenMP regions in the program, including what type of region they are and in which file and at what line numbers they are found. An example of the region overview section is the following:
---------------------------------------------------------------------- ---- ompP Region Overview ------------------------------------ ---------------------------------------------------------------------- PARALLEL: 1 region: * R00001 omp_sync.cpp (34-47) CRITICAL: 1 region: * R00002 omp_sync.cpp (45-46) (unnamed) ----------------------------------------------------------------------
This outlines two OpenMP regions, one being a parallel region and the other being a critical region, where only one thread may execute at a time.
Callgraph
This section gives a good general overview regarding how much execution time each region is taking up. For example:
---------------------------------------------------------------------- ---- ompP Callgraph ------------------------------------------ ---------------------------------------------------------------------- Inclusive (%) Exclusive (%) 0.17 (100.0%) 0.00 ( 0.26%) [unknown: 8 threads] 0.17 (99.74%) 0.17 (99.73%) PARALLEL +-R00001 omp_sync.cpp (35-48) 0.00 (0.001%) 0.00 (0.001%) CRITICAL +-R00002 omp_sync.cpp (46-47) (unnamed) ----------------------------------------------------------------------
In this callgraph section you can see that the main took up 0.26% of the execution time, the parallel region from lines 35-48 took up 99.73% of the execution time and the critical region only took up 0.001% of the execution time.
Region Profile
These sections give detailed information on each region as well as detailed information for each thread.
---------------------------------------------------------------------- ---- ompP Flat Region Profile (inclusive data) --------------- ---------------------------------------------------------------------- R00001 omp_sync.cpp (35-48) PARALLEL TID execT execC bodyT exitBarT startupT shutdwnT taskT 0 0.95 1 0.95 0.00 0.00 0.00 0.00 1 0.95 1 0.95 0.00 0.00 0.00 0.00 2 0.95 1 0.91 0.02 0.03 0.00 0.00 3 0.95 1 0.93 0.00 0.03 0.00 0.00 4 0.95 1 0.93 0.00 0.02 0.00 0.00 5 0.95 1 0.93 0.01 0.01 0.00 0.00 6 0.95 1 0.94 0.01 0.01 0.00 0.00 7 0.95 1 0.94 0.00 0.01 0.00 0.00 SUM 7.63 8 7.49 0.04 0.10 0.00 0.00 ----------------------------------------------------------------------
From left to right you have the thread ids, the execution time, execution count, body time, exit barrier time, startup time, shutdown time and task time. Each column is summed in the last row to show cumulative time/counts for all threads.
Overhead Analysis Report
This section gives details for the various overhead times incurred from parallel regions. For example, this overhead section is from a program execution with a large number of elements.
---------------------------------------------------------------------- ---- ompP Overhead Analysis Report --------------------------- ---------------------------------------------------------------------- Total runtime (wallclock) : 0.95 sec [8 threads] Number of parallel regions : 1 Parallel coverage : 0.95 sec (99.96%) Parallel regions sorted by wallclock time: Type Location Wallclock (%) R00001 PARALLEL omp_sync.cpp (35-48) 0.95 (99.96) SUM 0.95 (99.96) Overheads wrt. each individual parallel region: Total Ovhds (%) = Synch (%) + Imbal (%) + Limpar (%) + Mgmt (%) R00001 7.63 0.14 ( 1.85) 0.00 ( 0.00) 0.04 ( 0.51) 0.00 ( 0.00) 0.10 ( 1.34) Overheads wrt. whole program: Total Ovhds (%) = Synch (%) + Imbal (%) + Limpar (%) + Mgmt (%) R00001 7.63 0.14 ( 1.85) 0.00 ( 0.00) 0.04 ( 0.51) 0.00 ( 0.00) 0.10 ( 1.34) SUM 7.63 0.14 ( 1.85) 0.00 ( 0.00) 0.04 ( 0.51) 0.00 ( 0.00) 0.10 ( 1.34) ----------------------------------------------------------------------
From left to right: synchronization, imbalance, limited parallelism and thread management
You can see that the overhead for the parallel region is very minimal, taking up only 1.85% of the program’s total execution time. By maintaining proper workload balance between threads and providing enough work to warrant parallelization, the overhead percentages can be kept low.
To contrast, the following is the overhead analysis generated for an execution with a much smaller number of elements:
---------------------------------------------------------------------- ---- ompP Overhead Analysis Report --------------------------- ---------------------------------------------------------------------- Total runtime (wallclock) : 0.17 sec [8 threads] Number of parallel regions : 1 Parallel coverage : 0.17 sec (99.74%) Parallel regions sorted by wallclock time: Type Location Wallclock (%) R00001 PARALLEL omp_sync.cpp (35-48) 0.17 (99.74) SUM 0.17 (99.74) Overheads wrt. each individual parallel region: Total Ovhds (%) = Synch (%) + Imbal (%) + Limpar (%) + Mgmt (%) R00001 1.37 0.18 (13.24) 0.00 ( 0.00) 0.07 ( 5.25) 0.00 ( 0.00) 0.11 ( 7.99) Overheads wrt. whole program: Total Ovhds (%) = Synch (%) + Imbal (%) + Limpar (%) + Mgmt (%) R00001 1.37 0.18 (13.21) 0.00 ( 0.00) 0.07 ( 5.23) 0.00 ( 0.00) 0.11 ( 7.97) SUM 1.37 0.18 (13.21) 0.00 ( 0.00) 0.07 ( 5.23) 0.00 ( 0.00) 0.11 ( 7.97) ----------------------------------------------------------------------
The time spent due to imbalance went from 0.04 to 0.07, and thread management also increased a small amount. However, the total overhead percentage has increased from only 1.85% to 13.21%. This may indicate a situation where parallelization may not be worth the overhead costs, or that you may benefit from reducing the number of threads to allow better workload balancing.