Open main menu

CDOT Wiki β

Changes

DPS921/OpenACC vs OpenMP Comparison

2,277 bytes added, 17:20, 30 November 2020
m
no edit summary
Originally, OpenACC compilation is supported by the PGI compiler which requires subscription, there has been new options in recent years.
=== Nvidia HPC SDK [https://developer.nvidia.com/hpc-sdk] ===Evolved from PGI Compiler community edition. Installation guide are provided in the official website. Currently only supports Linux systems, but Windows support will come soon.<source>wget https://developer.download.nvidia.com/hpc-sdk/20.9/nvhpc-20-9_20.9_amd64.deb \
https://developer.download.nvidia.com/hpc-sdk/20.9/nvhpc-2020_20.9_amd64.deb sudo apt-get install ./nvhpc-20-9_20.9_amd64.deb ./nvhpc-2020_20.9_amd64.deb</source> After installation, the compilers can be found under <code>/opt/nvidia/hpc_sdk/Linux_x86_64/20.9/compilers/bin</code>, and OpenACC code can be compiled with <code>nvc -acc -gpu=manage demo.c</code>, where <code>-acc</code> indicates that the code will include OpenACC directives, and <code>-gpu=manage</code> indicates how should memory be managed. <code>nvc</code> is used here because source code is C code, there is <code>nvc++</code> for compiling C++ code. The compiler can also tell how the parallel regions are generalized if you pass in a <code>-Minfo</code> option like<source>$ nvc -acc -gpu=managed -Minfo demo.cmain: 79, Generating implicit copyin(A[:256][:256]) [if not already present] Generating implicit copy(_error) [if not already present] Generating implicit copyout(Anew[1:254][1:254]) [if not already present] 83, Loop is parallelizable 85, Loop is parallelizable Accelerator kernel generated Generating Tesla code 83, #pragma acc loop gang, vector(4) /* blockIdx.y threadIdx.y */ Generating implicit reduction(max:_error) 85, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */ 91, Generating implicit copyout(A[1:254][1:254]) [if not already present] Generating implicit copyin(Anew[1:254][1:254]) [if not already present] 95, Loop is parallelizable 97, Loop is parallelizable Accelerator kernel generated Generating Tesla code 95, #pragma acc loop gang, vector(4) /* blockIdx.y threadIdx.y */ 97, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */</source>This tells which loops are parallelized with line numbers for reference. For Windows users that would like to try this SDK, WSL2 is one way to go. === GCC [https://gcc.gnu.org/wiki/OpenACC] ===
Latest GCC version, GCC 10 has support to OpenACC 2.6
= OpenMP vs OpenACC =
We are comparing with OpenMP for two reasons. First, OpenMP is also based on directives to parallelize code; second, OpenMP started support of offloading to accelerators starting OpenMP 4.0 using `<code>target` </code> constructs. OpenACC uses directives to tell the compiler where to parallelize loops, and how to manage data between host and accelerator memories. OpenMP takes a more generic approach, it allows programmers to explicitly spread the execution of loops, code regions and tasks across teams of threads.
OpenMP's directives tell the compiler to generate parallel code in that specific way, leaving little room to the discretion of the compiler and the optimizer. The compiler must do as instructed. It is up to the programmer to guarantee that generated code is correct, parallelization and scheduling are also responsibility of the programmer, not the compiler at runtime.
36
edits