Open main menu

CDOT Wiki β

Changes

GPU621/False Sharing

19 bytes removed, 14:48, 26 November 2021
Results
[[File:naive_implementation.png|600px|thumb|Execution time of naive implementation without any optimization levels (Od)]]
The algorithm calculates the correct answer, but the performance is absolutely terrible. The reason is that an int array is a contiguous block of memory with each integer taking up 4 bytes. Assuming a 64 byte cache line, our entire array only takes up half of the cache opening up the possibility for multiple threads to share the same cache line resulting in false sharing. Although there were cases where higher thread count produced better results, there were many cases that performed worse than a single thread. This is due to the scheduling of thread execution that is out of the programmer's hands. It is possible that the selected schedule managed to minimize the frequency of false sharing giving better performance. However, this is extremely unreliable, so we need a better solution to false sharing.                       
== Solutions to False Sharing ==
83
edits