Open main menu

CDOT Wiki β

Changes

GPU621/Analyzing False Sharing

1,586 bytes added, 15:10, 7 November 2022
no edit summary
# [mailto:rleong4@myseneca.ca?subject=GPU621 Ryan Leong]
# [mailto:yppadsala@myseneca.ca?subject=GPU621 Yash Padsala]
# [mailto:sgpatel22@myseneca.ca?subject=GPU621 Shani Patel]
= '''Preface''' =
== Cache Lines ==
[[File:Pyramid Model.png|800px]]<br />
 
In order to carry out the following discussion, we need to first familiarize ourselves with the concept of cache lines. Students who have studied this part of the OS course on storage architecture should be impressed by the pyramid model of the memory hierarchy, where the pyramid from top to bottom represents a reduction in the cost and larger capacity of the storage medium, and from bottom to top represents an increase in access speed. The top of the pyramid model is located in the CPU registers, followed by the CPU cache (L1, L2, L3), then down to the memory, the bottom is the disk, the operating system uses this storage hierarchy model is mainly to solve the contradiction between the CPU's high speed and memory disk low speed, the CPU will be recently used data read in advance to the Cache, the next time to access the same data, the CPU can be directly from the faster CPU. The next time the same data is accessed, it can be read directly from the faster CPU cache, avoiding slowing down the overall speed by reading from memory or disk.
Each CPU cache line transitions between four states to determine whether the CPU cache is invalidated. For example, if CPU1 performs a write operation on a cache line, this operation will cause the cache line of other CPUs to enter the Invalid state, and the CPU will need to re-read the cache line from memory when it needs to use it. This solves the problem of cache coherency among multiple CPUs.
 
= What is false sharing =
What is false-sharing? As we mentioned above, CPU caching is done in cache lines, i.e., in addition to the data it needs to read and write, it will also cache data that is in the same cache line as the data. When the CPU reads the data "d", the CPU will add the eight char data of "abcdefgh" to the CPU cache as a cache line. Suppose the computer has two CPUs: CPU1 and CPU2, CPU1 only reads and writes "a" data frequently, and CPU2 only reads and writes "b" data frequently, it is reasonable to say that these two CPUs do not have any correlation between reading and writing data. However, since the CPU cache is accessed and invalidated by the cache line, even if CPU1 only changes the "a" data in the cache line, it will cause the cache line to be completely invalidated in CPU2, and by the same token, CPU2's change to "b" will also invalidate the cache line. "This causes the cache line to be "ping-ponged" between the two CPUs, and the cache line is invalidated frequently, which eventually leads to a performance degradation of the program, which is false-sharing.
 
 
 
 
= Summary =
When a pseudo-share is a real performance bottleneck, it is necessary to try to find and fix it, but in most applications where performance is not as critical, the existence of pseudo-shares is so little harm to the program that it is sometimes not worth the effort and extra memory space (cache line fill) to find pseudo-shares in the system. As I've always said, "Don't over-optimize, don't over-optimize." .
118
edits