Changes

Jump to: navigation, search

GPU621/Analyzing False Sharing

497 bytes added, 17:54, 24 November 2022
no edit summary
[[File:mySerial.jpg|400px]]<br />
Have you learned? The so-called false sharing can make performance worse in fact this cannot be generalized. We can see that with O2 on, the CPU and compiler are able to optimize the serial version of the code very well, while the multi-threaded code is not well optimized, which leads to higher performance of the serial version of the code. Even using a byte-aligned cache line size does not help. In the first example, we increased the number of vector numbers, which caused the serial code to slow down significantly. The differences in the two examples are related to the specific execution logic. So don't treat pseudo-sharing as a beast, and don't treat 64-byte alignment as a panacea. Everything has to be actually tested to know. Don't throw out the conclusion under O0 and then take it as a guideline, that makes no sense ......
118
edits

Navigation menu