Open main menu

CDOT Wiki β

Changes

GPU621/False Sharing

509 bytes added, 00:45, 27 November 2021
Synchronization and Thread Local Variables
=== Synchronization and Thread Local Variables ===
Thus, we searched for a more elegant solution that avoids race conditions and cache line sharing without resorting to padding. To solve the concurrency issue, we utilized OpenMP's critical construct allowing more direct control over thread execution. To solve cache line sharing, we turned to thread local variables. Originally, we faced an issue where depending on the order of thread execution the program would yield different results. Now by marking a region as a critical section, we can ensure that only one thread can access whatever is inside this region. Other threads must wait until the region becomes unoccupied before it is their turn to execute it. Although the rest of the parallel region can be done in any order, we can guarantee that pi variable will always be the most up to date version when a thread enters the critical region.
<pre>
}
</pre>
 
To solve the concurrency issue, we utilized OpenMP's critical construct allowing more direct control over thread execution. Originally, we faced an issue where depending on the order of thread execution the program would yield different results. Now by marking a region as a critical section, we can ensure that only one thread can access whatever is inside this region. Other threads must wait until the region becomes unoccupied before it is their turn to execute it. One thing we had to be careful was how large we set the critical region. Making it too large reduced the effectiveness of multi-threading as threads may become idle for too long waiting their turn. Although the rest of the parallel region can be done in any order we definitely don't want ambiguity surrounding which thread has the most up-to-date version of the pi. Using #pragma omp critical guaranteed that pi will only have one version.
 
The next issue was tackling cache line sharing. With an array, the memory was contiguous making it highly likely the data would share the same cache line. However, by allocating a local sum variable for each thread, it becomes less likely threads will share the same cache line mitigating the impact of false sharing.
== Conclusion ==
83
edits