83
edits
Changes
→Analyzing False Sharing
=== Results ===
[[File:naive_implementation.png|600px500px|thumb|Execution time of naive implementation without any optimization levels (Od).]]
The algorithm calculates the correct answer, but the performance is absolutely terrible. The reason is that an int array is a contiguous block of memory with each integer taking up 4 bytes. Assuming a 64 byte cache line, our entire array only takes up half of the cache opening up the possibility for multiple threads to share the same cache line resulting in false sharing. Although there were cases where higher thread count produced better results, there were many cases that performed worse than a single thread. This is due to the scheduling of thread execution that is out of the programmer's hands. It is possible that the selected schedule managed to minimize the frequency of false sharing giving better performance. However, this is extremely unreliable, so we need a better solution to false sharing.
<br><br><br><br><br><br><br><br><br><br><br><br>
== Solutions to False Sharing ==
=== Padding ===
This observation was the basis for our first solution to false sharing. The idea was to pad out each element in the array with enough space to act as a boundary or separator for the cache line. If we can force each thread to only bring in their array index to their cache then each individual thread will have a distinct cache line, thus eliminating false sharing.
<br><br><br><br><br><br><br><br><br><br><br>
In the workshop, we accomplished this by making several modifications to the original code:
=== Drawbacks ===
While our the idea works, there were several issues that prevented it from being an ideal solution: [[File:padded_implementation.png|600px500px|thumb|Execution time of padded implementation without optimization (Od).]]