36
edits
Changes
m
== OpenMP GPU offloading ==We are comparing with OpenMP because for two reasons. First, OpenMP is also based on directives to parallelize code; second, OpenMP started support of offloading to accelerators starting OpenMP 4.0 using `target` constructs. OpenACC uses directives to tell the compiler where to parallelize loops, and how to manage data between host and accelerator memories. OpenMP takes a more generic approach, it allows programmers to explicitly spread the execution of loops, code regions and tasks across teams of threads.
array2[i] = …; array2[i] = …;
added openmp vs openacc difference in summary
= OpenMP vs OpenACC =
OpenMP's directives tell the compiler to generate parallel code in that specific way, leaving little room to the discretion of the compiler and the optimizer. The compiler must do as instructed. It is up to the programmer to guarantee that generated code is correct, parallelization and scheduling are also responsibility of the programmer, not the compiler at runtime. OpenACC's parallel directives tells the compiler that the loop is a parallel loop. It is up to the compiler to decide how to parallelize the loop. For example the compiler can generate code to run the iterations across threads, or run the iterations across SIMD lanes. The compiler gets to decide method of parallelization based on the underlying hardware architecture, or use a mixture of different methods. So the real difference between the two is how much freedom is given to the compilers.
== Code comparison ==
#pragma acc loop vector #pragma omp simd
for(int i = 0; i < N; i++) for(int i = 0; i < N; i++)
} }
#pragma acc kernels #pragma omp target
{ #pragma omp parallel
for(int i = 0; i < N; i++){ { tmp = …; #pragma omp for private(tmp) array[i] = tmp * …; for(int i = 0; i < N; i++){ } tmp = …; for(int i = 0; i < N; i++) array[i] = tmp = * …; array2[i] = … array[i] = tmp * …; } } } #pragma omp for simd
for(int i = 0; i < N; i++)
array2[i] = …