24
edits
Changes
→Programming GPUs with OpenMP
<pre>
// Offloading to the target device, but still without parallelism.
#pragma omp target map(to:A,B), map(tofrom:sum)
{
}
</pre>
<h3>Dynamically allocated data</h3>
If we have dynamically allocated data in the host region that we'd like to map to the target region. Then in the map clause we'll need to specify the number of elements that we'd like to copy over. Otherwise all the compiler would have is a pointer to some region in memory. As it would require the size of allocated memory that needs to be mapped over to the target device.
<pre>
int* a = (int*)malloc(sizeof(int) * N);
#pragma omp target map(to: a[0:N]) // [start:length]
</pre>
<h3>Target data regions</h3>
<h3>Teams construct</h3>
<h3>Declare Target</h3>
''Calling functions within the scope of a target region.''
* The ''declare target'' construct will compile a version of a function that can be called on the device.
* In order to offload a function onto the target's device region the function must be first declare on the target.
<pre>
#pragma omp declare target
int combine(int a, int b);
#pragma omp end declare target
#pragma omp target teams distribute parallel for \
map(to: A, B), map(tofrom:sum), reduction(+:sum)
for (int i = 0; i < N; i++) {
sum += combine(A[i], B[i])
}
</pre>