Changes

GPU621/Analyzing False Sharing

499 bytes removed, 15:34, 7 November 2022

no edit summary

# [mailto:rleong4@myseneca.ca?subject=GPU621 Ryan Leong]

# [mailto:yppadsala@myseneca.ca?subject=GPU621 Yash Padsala]

= '''What to know before understanding false sharing''' =

== ~~CPU cache architecture~~ Cache Lines ==~~The cpu is the heart of the computer and all operations and programs are ultimately executed by him.~~ ~~The main memory RAM is where the data exists and there are several levels of cache between the CPU and the main memory because even direct access to the main memory is relatively very slow.~~ If you do the same operation multiple times on a piece of data, it makes sense to load it close to the CPU while executing the operation, for example a loop counter, you don't want to go to the main memory every loop to fetch this data to grow it.~~ ~~[[File:~~CPUCacheArchitecture~~Pyramid Model.png|800px]]~~ ~~ ~~L3 is more common in modern multicore machines~~In order to carry out the following discussion, ~~is still larger, slower, and shared~~ we need to first familiarize ourselves with the concept of cache lines. Students who have studied this part of the OS course on storage architecture should be impressed by ~~all CPU cores on a single socket. Finally, main~~ the pyramid model of the memoryhierarchy, ~~which holds all~~ where the ~~data for program operation, is~~ pyramid from top to bottom represents a reduction in the cost and larger~~, slower~~capacity of the storage medium, and ~~shared by all CPU cores on all slots~~from bottom to top represents an increase in access speed. ~~When~~ The top of the pyramid model is located in the CPU ~~performs an operation~~registers, ~~it first goes to~~ followed by the CPU cache (L1 ~~to find the required data~~, ~~then~~ L2, L3), then L3down to the memory, the bottom is the disk, the operating system uses this storage hierarchy model is mainly to solve the contradiction between the CPU's high speed and ~~finally if none of these caches are available~~memory disk low speed, the ~~required~~ CPU will be recently used data ~~has~~ read in advance to go the Cache, the next time to ~~main memory to get it. The farther it goes~~access the same data, the ~~longer~~ CPU can be directly from the ~~operation takes~~faster CPU. ~~So if you do some very frequent operations, make sure~~ The next time the same data is in accessed, it can be read directly from the ~~L1 cache.~~ == faster CPU cache ~~line ==A cache is made up of cache lines~~, ~~usually 64 bytes (common processors have 64-byte cache lines, older processors have 32-byte cache lines), and it effectively references a block of addresses in main~~ avoiding slowing down the overall speed by reading from memoryor disk.

~~A C++ double type~~ The smallest unit of CPU cache is ~~8 bytes, so 8 double variables can be stored in a~~ the cache line.~~ [[File:CPUCacheLines.png|800px]] During program runtime~~, the cache ~~is loaded with 64 consecutive bytes from main memory for each update. Thus~~line size varies depending on the architecture, ~~if an array of type double is accessed~~the most common ones are 64Byte and 32Byte, ~~when a value in~~ the ~~array is loaded into~~ CPU cache accesses data from within the cacheline unit, each time taking the ~~other 7 elements are also loaded into~~ entire cache line where the ~~cache. However~~data needs to be read, even if the ~~items in the~~ adjacent data ~~structure~~ is not used ~~are not adjacent to each other~~ will also be cached in ~~memory, such as a chain table, then~~ the ~~benefits of~~ CPU cache ~~loading will not be available~~.

Ryan Leong

118

edits

Changes

GPU621/Analyzing False Sharing

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools