Changes

Jump to: navigation, search

GPU621/To Be Announced

2,422 bytes added, 22:45, 18 November 2020
no edit summary
How to set up compiler and target offloading for Linux on AMD GPU: (Elena)
 
== Programming GPUs with OpenMP ==
<h3>Target Region</h3>
* The target region is the offloading construct in OpenMP.
<pre>int main() {
// This code executes on the host (CPU)
 
#pragma omp target
// This code executes on the device
 
}
</pre>
 
* An OpenMP program will begin executing on the host (CPU).
* When a target region is encountered the code that is within the target region will begin to execute on a device (GPU).
 
If no other construct is specified, for instance a construct to enable a parallelized region (''#pragma omp parallel'').
By default, the code within the target region will execute sequentially. The target region does not express parallelism, it only expresses where the contained code is going to be executed on.
 
There is an implied synchronization between the host and the device at the end of a target region. At the end of a target region the host thread waits for the target region to finish execution and continues executing the next statements.
 
<h3>Mapping host and device data</h3>
 
* In order to access data inside the target region it must be mapped to the device.
* The host environment and device environment have separate memory.
* Data that has been mapped to the device from the host cannot access that data until the target region (Device) has completed its execution.
 
The map clause provides the ability to control a variable over a target region.
 
''#pragma omp target map(map-type : list)''
 
* ''list'' specifies the data variables to be mapped from the host data environment to the target's device environment.
 
* ''map-type'' is one of the types '''to''', '''from''', '''tofrom''', or '''alloc'''.
 
'''to''' - copies the data to the device on execution.
 
'''from''' - copies the data to the host on exit.
 
'''tofrom''' - copies the data to the device on execution and back on exit.
 
'''alloc''' - allocated an uninitialized copy on the device (without copying from the host environment).
 
<pre>
#pragma omp target map(to:A,B), map(tofrom:sum)
{
for (int i = 0; i < N; i++)
sum += A[i] + B[i];
}
</pre>
 
<h3>Declare Target</h3>
''Calling functions within the scope of a target region.''
== Code for tests (Nathan) ==
== Sources ==
https://www.ibm.com/support/knowledgecenter/en/SSXVZZ_16.1.0/com.ibm.xlcpp161.lelinux.doc/compiler_ref/prag_omp_target.html
https://www.ibm.com/support/knowledgecenter/en/SSXVZZ_16.1.0/com.ibm.xlcpp161.lelinux.doc/compiler_ref/prag_omp_declare_target.html
24
edits

Navigation menu