Changes

← Older edit

GPU621/Group 1

3,149 bytes added, 10:30, 12 April 2023

→‎Analyzing False Sharing

== Analyzing False Sharing ==

[[File:False Sahring 2023.pdf]]

== Team Members ==

== Introduction : What is False Sharing? ==

~~Multicore programming~~ Cache is an important ~~to take advantage of the hardware's power as multicore processors are more common than ever. This is because it enables us to run our code on various CPU cores~~system component. ~~But in order to effectively utilise it, it is crucial to know and comprehend the underlying hardware. The~~ A cache line is ~~one of~~ the ~~most crucial system tools. The majority~~ smallest portion of ~~designs also have shared cache lines. And for this reason, false sharing in multicore/multithreaded tasks is~~ data that can be mapped into a ~~well-known issue. What is~~ cache ~~line ping-ponging~~, ~~also known as false sharing?When multiple~~ sometime it can get mistreated by threads ~~exchange data, one of the sharing patterns~~ when that ~~has an impact on performance~~ happens it is called false sharing. ~~It occurs~~ The way that it happens is when ~~at least two~~ 2 threads ~~alter or use~~ access data that is ~~in near proximity~~ physically close to one another in memory. ~~When multiple threads exchange data~~, ~~one of the sharing patterns that has an impact on performance is false sharing. When at least two threads change or use data that just so happens~~ which in response may cause it to be ~~nearby in memory and ends up in~~ used or modified by the ~~same cache line, it causes this problem~~2 threads. False sharing does affect performance when it happens ~~when they frequently change their individual data in such a way that the cache line switches back and forth between the caches of two different threads~~.

== Cache ==

return 0;

}

</pre>

~~</pre>~~Two counters, counter1 and counter2, as well as two functions, increment1() and increment2(), are defined in the code. These functions increment the values of the corresponding counters. The programme compares the runtimes of both scenarios after performing the increment functions on a single thread and on multiple threads for a predetermined number of iterations. In order to guarantee alignment with the cache line size, the Counter struct is defined with alignas(CACHE_LINE_SIZE). Since the value member is an atomic long long, concurrent access to the counter is permitted without risk. The remaining bytes are taken up by the padding member, which makes sure the struct is aligned to the cache line size. The constants NUM_THREADS, NUM_ITERATIONS, and CACHE_LINE_SIZE control the program. NUM_THREADS specifies the number of threads to use, NUM_ITERATIONS specifies the number of times to increment each counter, and CACHE_LINE_SIZE specifies the size of a cache line. The checkRuntime() function measures the runtime of the program by taking the difference between the start and end times and printing the result in milliseconds. The main function first prints the cache line size. It then runs the increment functions for NUM_ITERATIONS using a single thread, measures the runtime, and prints the values of the counters. It then runs the increment functions using multiple threads, measures the runtime, and prints the values of the counters. [[File:output.jpg]] The cache line size, which in this example is set to 64 bytes, is the first thing that the programme prints. The programme is then run twice, once with a single thread and once with two threads. In each instance, it uses a loop that iterates NUM_ITERATIONS times to increase two counter variables (counter1 and counter2). The runtime and final values of the counter variables are then printed by the program. The output demonstrates that using more than one thread (479 ms) results in a longer programme execution time. (355 ms). This is due to the fact that the two threads are incrementing two different counter variables, each of which is situated on a different cache line. Because of this, there is competition among the threads for access to the cache lines, which results in cache misses and slower performance. If the two counter variables were on the same cache line, the programme would probably execute more quickly when using multiple threads. == Summary == False sharing is a performance problem that can happen in multi-threaded applications when different variables on the same cache line are accessed by different threads at the same time. Performance may suffer as a result of needless cache invalidations and cache line transfers. Finding the cache lines that are shared by several threads and then identifying the variables that cause the false sharing are the first steps in false sharing analysis. Profiling tools that track cache line accesses and spot conflicts in cache lines can be used for this. In conclusion, watch out for false sharing because it kills scalability. The general situation to be on the lookout for is when there are two objects or fields that are frequently accessed by different threads for reading or writing, at least one of the threads is performing writes, and the objects are close enough in memory that they fall on the same cache line. Utilize performance analysis tools and CPU monitors as detecting false sharing isn't always simple. Last but not least, you can prevent false sharing by lowering the frequency of updates to the variables that are shared but aren't actually. For instance, update local data rather than the shared variable. Additionally, by padding or aligning data on a cache line in such a way that it guarantees that no other data comes before or after a key object in the same cache line, you can ensure that a variable is completely unshared.

== Sources ==

Alon Raigorodetsky

32

edits

Changes

GPU621/Group 1

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools