Changes

GPU621/Analyzing False Sharing

4,809 bytes added, 13:55, 26 November 2022

no edit summary

=Group Members=

~~<br />~~

# [mailto:rleong4@myseneca.ca?subject=GPU621 Ryan Leong]

# [mailto:yppadsala@myseneca.ca?subject=GPU621 Yash Padsala]

# [mailto:sgpatel22@myseneca.ca?subject=GPU621 Shani Patel]

= '''~~Example Of A False Sharing~~Preface''' =In concurrency, pseudo-sharing is equivalent to a performance killer in multicore concurrent programming, and contention is equivalent to a "performance assassin”. Assassins differ from killers because a killer can be seen and fought, run, detoured, begged for mercy, or be stopped, but a killer cannot be seen and disguised. It is impossible to prevent the "assassin" from lurking in the shadows and waiting for an opportunity to strike. We can take a variety of measures (such as shortening the critical area, atomic operations, etc.) when we encounter lock contention that affects concurrency performance in concurrent programming. The code we write does not see pseudo-sharing, so we are unable to find it and cannot fix it, so we cannot improve the performance of the program. We can't make any changes to this, which results in pseudo-sharing in the dark, which slows concurrency performance down significantly.

= '''What you need to know before understanding false sharing''' =

== Cache ==

Basically, the cache is a location where data is stored for CPU use. But there are many types of memory for the CPU. Ideally, CPU can not access every data with lightning speed. So, let's talk about it in brief. In our PC and laptops, we have hard drives and SSDs which are slow in speed, but they can store big data. So, when the CPU tries to access this data it will take some time, so this process takes more computational time.

[[File:cache.jpg|right|400px]]<br />

Moreover, DRMS is quite expensive. However, it is fast but still, DRMS is not enough for CPUs. DRMS works on electricity. Hence if power is cut that time data is not accessible. Furthermore, SRAM is very fast and smaller but expensive. SRAM memory used in cache. So, the first CPU finds data from the cache and if data is not in the cache, then the CPU starts to find it in the main memory and tries until it finds that data. So, this data transfers into the cache. This transfer works in two ways. The first way is that the same data is transferred into the cache for a short period of time and this is known as temporal locality. Another one is a spatial locality and this method also accessed nearby locations if it will be needed.

== Cache Coherence ==

[[File:cacheCoherence.jpg|center|800px]]<br />

When the CPU accesses the data at that time data goes into the cache and this memory block is known as the cache line. But it is not possible that modify the original data when changes in the cache line. Here cache coherence helps that stored in multiple local caches. Cache coherence connects all cache.

= '''What is False Sharing?''' =

False Sharing is sharing pattern that happens when multiple applications run and share the same memory. When more than one thread read or updates the same data in logical memory it ends up in the same catch line. Each application’s cache is a copy of a common source. So, when we modify one cache it will cause others to reload from the common source. In other words, when one program makes a change in one cache and it does not change the data which is used by the second program, this also forces another cache to reload so, in this scenario reload cache is a useless system resource and it may create a negative impact on the performance of the program. It is not easy to catch false sharing and stop it. However, there are some ways which help us to overcome false sharing.

The foremost reason why false sharing happens can be found in how operating systems read and write data. When a program reads or writes data from the hard drive or other sources at that time this data loads into a temporary cache. Because it is a very fast way to access it. This knows as a cache line.

= '''Solutions Of False Sharing''' =

== Using local variable ==

#include <thread>

#include <vector>

So the problem of false sharing sometimes cannot be generalized, and the occurrence of false sharing does not necessarily make performance lower. But no matter how the new block is the optimal solution.

== ~~example 2~~ Byte alignment ==

If you think the above example is too limiting, not all cases can be abstracted to such a local variable. So let's let the two variables exist on different cache lines!

So don't treat pseudo-sharing as a beast, and don't treat 64-byte alignment as a panacea. Everything has to be actually tested to know. Don't throw out the conclusion under O0 and then take it as a guideline, that makes no sense ......

= '''Conclusion''' =

In conclusion, the term false sharing refers to the use of shared memory by multiple threads at the same time. A multiprocessor environment decreases performance. Shared data is modified by multiple processors simultaneously. This is a very common occurrence in the loop. In many cases, false sharing can be overlooked, resulting in a program's inability to scale. In parallel programming, where performance is fundamental, keeping an eye out for problems and recognizing them quickly is essential.

Finally, we discussed some solutions to minimize false sharing by making as much private data as possible to reduce how often shared data needs to be updated. Making use of the compiler's optimization features to reduce memory loads and stores. By increasing padding, we will be able to improve performance.

= References =

https://www.easytechjunkie.com/what-is-false-sharing.htm

https://levelup.gitconnected.com/false-sharing-the-lesser-known-performance-killer-bbb6c1354f07

https://wiki.cdot.senecacollege.ca/wiki/GPU621/False_Sharing

Ryan Leong

118

edits

Changes

GPU621/Analyzing False Sharing

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools