Open main menu

CDOT Wiki β

Changes

DPS921/OpenACC vs OpenMP Comparison

675 bytes added, 18:03, 7 December 2020
Added professor's comments on GPU computation
- Nov 18, 2020: Successful installation of required compiler and compilation of OpenACC code
- Nov 19, 2020: Adding MPI into discussion
 
== Important notes (prof's feedback) ==
Limitation with GPU computation is that GPU can only handle float precision. When calculating double precision, values need to be broken into floating point precision values, and combined back to double precision. This is one of the primary reasons why GPUs are not used in scientific computations.
 
Before using GPU to speed up computation, make sure you know the level of precision required for the results and intermediate results. For AI/CV/ML where precision requirement is low, it is safe to use GPU to speed up your computation.
= OpenACC =
while ( err > tol && iter < iter_max ) {
err=0.0f;
#pragma omp parallel { #pragma omp for shared(nx, Anew, A) reduction(max:err) for(int i = 1; i < nx-1; i++) { Anew[i] = 0.5f * (A[i+1] + A[i-1]); err = fmax(err, fabs(Anew[i] - A[i])); } #pragma omp for shared(nx, Anew, A) for( int i = 1; i < nx-1; i++ ) { A[i] = Anew[i]; } iter++;
}
#pragma omp parallel for shared(nx, Anew, A)
for( int i = 1; i < nx-1; i++ ) {
A[i] = Anew[i];
}
iter++;
}
</source>
err=0.0f;
#pragma omp target
#pragma omp parallel
{
#pragma omp parallel for shared(nx, Anew, A) reduction(max:err)
for(int i = 1; i < nx-1; i++) {
Anew[i] = 0.5f * (A[i+1] + A[i-1]);
err = fmax(err, fabs(Anew[i] - A[i]));
}
#pragma omp parallel for shared(nx, Anew, A)
for( int i = 1; i < nx-1; i++ ) {
A[i] = Anew[i];
}
</source>
 
=== OpenMP GPU Proper Implementation ===
* OpenMP CPU: 1x
* OpenACC Basic: ~0.5x 2x (slowertwice as slow)* OpenACC Proper: ~40.5x 14x (7 times faster)* OpenMP GPU Basic: ~0.1x 10x (10 times slower)* OpenMP GPU Proper: ~4x 0.21x (5 times faster)
= Collaboration =
36
edits