Changes

Jump to: navigation, search

GPU621/Analyzing False Sharing

66 bytes added, 15:39, 1 December 2022
Solutions Of False Sharing
Also, we add the following code to our main function:
cout << endl << "----new Local variable block---" << endl; // new Local variable block
{
vector<long> sum(sizeOfSum, 0);
}
auto end = chrono::steady_clock::now();
cout << "New Local variable block consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl;
}
[[File:newBlockOutput5.jpg|400px]]<br />
We can see that the new Local variable block is already much faster than the Thread block, and even comparable to the Serial block is almost the same. But overall the Serial block is still the least time-consuming because the new Local variable block still needs to incur extra overhead for thread creation and scheduling.
We could try increasing sizeOfNumbers to 1000000 as well, which would allow the program to process more data, thus compensating for the extra overhead of thread creation and scheduling.
Now we can already see the advantage of multi-threading. Even when the vector numbers reach the size of 1000000, the Thread block even runs faster than the Serial block.
So the problem of false sharing sometimes cannot be generalized, and the occurrence of false sharing does not necessarily make performance lower. But no matter how the new Local variable block is the optimal solution.
== Byte alignment ==
118
edits

Navigation menu