Open main menu

CDOT Wiki β

Changes

GPU621/False Sharing

1,545 bytes added, 13:15, 26 November 2021
Example
=== Example ===
 
In workshop 2, we briefly encountered false sharing. However, we did not get a formal explanation in lecture, so this section will serve to provide more context to the problem.
 
We were asked to multi-thread a serial version of a simple algorithm that calculated PI by integrating 1/(1 + x^2). The serial version utilized a scalar sum variable to accumulate the calculations. For our naïve attempt, we identified a potential issue with using a scalar sum in a multi-threaded program. We have no direct control on the order a thread finishes their assigned work and leaving it up to chance resulted in varying resulted in a race condition. This is not good. To get around this we came up with the idea to change sum to an array. Now, each thread can index this array by their own thread id and store their calculations.
 
<pre>
#include <iostream>
#include <omp.h>
#define NUM_THREADS 8// number of threads to request
using namespace std::chrono;
return 1;
}
 
int n = std::atoi(argv[1]);
steady_clock::time_point ts, te;
}
</pre>
 
==== Results ====
[[File:naive_implementation.png|600px|thumb|Execution time of naive implementation without any optimization levels (Od)]]
 
The algorithm calculates the correct answer, but the performance is absolutely terrible due to false sharing. Although there were cases where higher thread count produced better results, there were many cases that performed worse than a single thread. This is due to the scheduling of thread execution that is out of the programmer's hands. It is possible that the selected schedule managed to minimize the frequency of false sharing giving better performance. However, this is extremely unreliable, so we need a better solution to false sharing.
== Solutions to False Sharing ==
83
edits