Open main menu

CDOT Wiki β

Changes

DPS921/OpenACC vs OpenMP Comparison

438 bytes added, 19:03, 7 December 2020
Added professor's comments on GPU computation
- Nov 18, 2020: Successful installation of required compiler and compilation of OpenACC code
- Nov 19, 2020: Adding MPI into discussion
 
== Important notes (prof's feedback) ==
Limitation with GPU computation is that GPU can only handle float precision. When calculating double precision, values need to be broken into floating point precision values, and combined back to double precision. This is one of the primary reasons why GPUs are not used in scientific computations.
 
Before using GPU to speed up computation, make sure you know the level of precision required for the results and intermediate results. For AI/CV/ML where precision requirement is low, it is safe to use GPU to speed up your computation.
= OpenACC =
while ( err > tol && iter < iter_max ) {
err=0.0f;
#pragma omp parallel { #pragma omp for shared(nx, Anew, A) reduction(max:err) for(int i = 1; i < nx-1; i++) { Anew[i] = 0.5f * (A[i+1] + A[i-1]); err = fmax(err, fabs(Anew[i] - A[i])); } #pragma omp parallel for shared(nx, Anew, A) for( int i = 1; i < nx-1; i++ ) { A[i] = Anew[i]; } iter++;
}
iter++;
}
</source>
err=0.0f;
#pragma omp target
#pragma omp parallel
{
#pragma omp parallel for shared(nx, Anew, A) reduction(max:err)
for(int i = 1; i < nx-1; i++) {
Anew[i] = 0.5f * (A[i+1] + A[i-1]);
err = fmax(err, fabs(Anew[i] - A[i]));
}
#pragma omp parallel for shared(nx, Anew, A)
for( int i = 1; i < nx-1; i++ ) {
A[i] = Anew[i];
}
</source>
 
=== OpenMP GPU Proper Implementation ===
=== Execution time ===
Without access to GPU, the * OpenMP CPU: 1x* OpenACC code runs about Basic: ~2x (twice as slow)* OpenACC Proper: ~0.14x (7 times faster comparing to the )* OpenMP one, using the Nvidia HPC SDK <code>nvc</code> compiler. According to other data sources, with access to GPU, the OpenACC version should run about 7 Basic: ~10x (10 times faster than the slower)* OpenMP solution that runs on CPU; and the OpenACC version runs about 2GPU Proper: ~0.21x (5 times faster than an OpenMP version with GPU offloading.)
= Collaboration =
36
edits