Open main menu

CDOT Wiki β

Changes

GPU621/Analyzing False Sharing

1,957 bytes added, 16:01, 7 November 2022
no edit summary
The smallest unit of CPU cache is the cache line, the cache line size varies depending on the architecture, the most common ones are 64Byte and 32Byte, the CPU cache accesses data from within the cache line unit, each time taking the entire cache line where the data needs to be read, even if the adjacent data is not used will also be cached in the CPU cache.
 
[[File:CPUCacheLines.png|800px]]
 
For example, double is 8 bytes in C++ and our cache line is 64 bytes, when we read one of the elements of the double array from memory, the CPU will read the eight elements around that element into the cache line.
 
== Cache Consistency ==
In the case of single-core CPUs, the above method works fine and ensures that the data cached in the CPU cache is always "clean" because no other CPU will change the data in memory, but in the case of multi-core CPUs, the situation becomes a bit more complicated. In multi-CPU, each CPU has its own private cache (possibly shared L3 cache), and when one CPU1 operates on the cached data in the Cache, if CPU2 has changed the data before, the data in CPU1 is no longer "clean", i.e., it should be invalid. Cache coherency is to ensure that the cache is consistent across multiple CPUs.
 
The MESI protocol is used in Linux systems to handle cache coherency, by which is meant the four states of the CPU cache.
 
* M (Modified): the local processor has modified the cache line, i.e., it is a dirty line, its contents are not the same as those in memory, and there is only one local copy of this cache (proprietary).
* E (Exclusive): the contents of the cache line are the same as in memory and the line is not available to any other processor.
* S (Shared): the contents of the cache line are the same as in memory, and it is possible that a copy of the cache line exists in other processors.
* I (Invalid): The cache line is invalid, and cannot be used.
 
[[File:CacheConsistency.jpg]]
 
Each CPU cache line transitions between four states to determine whether the CPU cache is invalidated. For example, if CPU1 performs a write operation on a cache line, this operation will cause the cache line of other CPUs to enter the Invalid state, and the CPU will need to re-read the cache line from memory when it needs to use it. This solves the problem of cache coherency among multiple CPUs.
118
edits