36
edits
Changes
no edit summary
=== Example ===
<source>
#pragma acc kernels
{
for (int i = 0; i < N; i++) {
y[i] = a * x[i] + y[i];
}
}
</source>
== GPU offloading ==
== OpenMP GPU offloading ==
We are comparing with OpenMP because OpenMP started support of offloading to accelerators starting OpenMP 4.0 using `target` constructs. OpenACC uses directives to tell the compiler where to parallelize loops, and how to manage data between host and accelerator memories. OpenMP takes a more generic approach, it allows programmers to explicitly spread the execution of loops, code regions and tasks across teams of threads. OpenMP's directives tell the compiler to generate parallel code in that specific way, leaving little room to the discretion of the compiler and the optimizer.
== Code comparison ==
<source>
Explicit conversions
OpenACC OpenMP
#pragma acc kernels #pragma omp target
{ {
#pragma acc loop worker #pragma omp parallel for private(tmp)
for(int i = 0; i < N; i++){ for(int i = 0; i < N; i++){
tmp = …; tmp = …;
array[i] = tmp * …; array[i] = tmp * …;
} }
#pragma acc loop vector #pragma omp simd
for(int i = 0; i < N; i++) for(int i = 0; i < N; i++)
array2[i] = …; array2[i] = …;
} }
</source><source>
ACC parallel
OpenACC OpenMP
#pragma acc parallel #pragma omp target
{ #pragma omp parallel
#pragma acc loop {
for(int i = 0; i < N; i++){ #pragma omp for private(tmp) nowait
tmp = …; for(int i = 0; i < N; i++){
array[i] = tmp * …; tmp = …;
} array[i] = tmp * …;
#pragma acc loop }
for(int i = 0; i < N; i++) #pragma omp for simd
array2[i] = …; for(int i = 0; i < N; i++)
} array2[i] = …;
}
</source><source>
ACC Kernels
OpenACC OpenMP
#pragma acc kernels #pragma omp target
{ #pragma omp parallel
for(int i = 0; i < N; i++){ {
tmp = …; #pragma omp for private(tmp)
array[i] = tmp * …; for(int i = 0; i < N; i++){
for(int i = 0; i < N; i++) tmp = …;
array2[i] = … array[i] = tmp * …;
} }
#pragma omp for simd
for(int i = 0; i < N; i++)
array2[i] = …
}
</source><source>
Copy vs. PCopy
OpenACC OpenMP
int x[10],y[10]; int x[10],y[10];
#pragma acc data copy(x) pcopy(y) #pragma omp target data map(x,y)
{ {
... ...
#pragma acc kernels copy(x) pcopy(y) #pragma omp target update to(x)
{ #pragma omp target map(y)
// Accelerator Code {
... // Accelerator Code
} ...
... }
} }
</source>