Open main menu

CDOT Wiki β

Changes

GPU621/Analyzing False Sharing

499 bytes removed, 14:34, 7 November 2022
no edit summary
<br />
# [mailto:rleong4@myseneca.ca?subject=GPU621 Ryan Leong]
# [mailto:yppadsala@myseneca.ca?subject=GPU621 Yash Padsala]
= '''What to know before understanding false sharing''' =
== CPU cache architecture Cache Lines ==The cpu is the heart of the computer and all operations and programs are ultimately executed by him. The main memory RAM is where the data exists and there are several levels of cache between the CPU and the main memory because even direct access to the main memory is relatively very slow. If you do the same operation multiple times on a piece of data, it makes sense to load it close to the CPU while executing the operation, for example a loop counter, you don't want to go to the main memory every loop to fetch this data to grow it.<br /><br />[[File:CPUCacheArchitecturePyramid Model.png|800px]]<br /><br />L3 is more common in modern multicore machinesIn order to carry out the following discussion, is still larger, slower, and shared we need to first familiarize ourselves with the concept of cache lines. Students who have studied this part of the OS course on storage architecture should be impressed by all CPU cores on a single socket. Finally, main the pyramid model of the memoryhierarchy, which holds all where the data for program operation, is pyramid from top to bottom represents a reduction in the cost and larger, slowercapacity of the storage medium, and shared by all CPU cores on all slotsfrom bottom to top represents an increase in access speedWhen The top of the pyramid model is located in the CPU performs an operationregisters, it first goes to followed by the CPU cache (L1 to find the required data, then L2, L3), then L3down to the memory, the bottom is the disk, the operating system uses this storage hierarchy model is mainly to solve the contradiction between the CPU's high speed and finally if none of these caches are availablememory disk low speed, the required CPU will be recently used data has read in advance to go the Cache, the next time to main memory to get it. The farther it goesaccess the same data, the longer CPU can be directly from the operation takesfaster CPU. So if you do some very frequent operations, make sure The next time the same data is in accessed, it can be read directly from the L1 cache. == faster CPU cache line ==A cache is made up of cache lines, usually 64 bytes (common processors have 64-byte cache lines, older processors have 32-byte cache lines), and it effectively references a block of addresses in main avoiding slowing down the overall speed by reading from memoryor disk.
A C++ double type The smallest unit of CPU cache is 8 bytes, so 8 double variables can be stored in a the cache line.<br /><br />[[File:CPUCacheLines.png|800px]]<br /><br />During program runtime, the cache is loaded with 64 consecutive bytes from main memory for each update. Thusline size varies depending on the architecture, if an array of type double is accessedthe most common ones are 64Byte and 32Byte, when a value in the array is loaded into CPU cache accesses data from within the cacheline unit, each time taking the other 7 elements are also loaded into entire cache line where the cache. Howeverdata needs to be read, even if the items in the adjacent data structure is not used are not adjacent to each other will also be cached in memory, such as a chain table, then the benefits of CPU cache loading will not be available.
118
edits