Changes

Jump to: navigation, search

GPU621/Analyzing False Sharing

919 bytes added, 15:17, 23 November 2022
Example Of A False Sharing
int main() {
vector<int> numbers;
for (int i = 0; i < sizeOfNumbers; i++) { //Initialize vector numbers to 0 to 100 numbers.push_back(i); } cout << "-----Thread-----" << endl; { // Threadblock { vector<long> sum(sizeOfSum, 0); //Set size=sizeOfSum and all to 0 auto start = chrono::steady_clock::now(); vector<thread> tdthread; for (int i = 0; i < sizeOfSum; i++) { td thread.emplace_back(sumUp, numbers, ref(sum), Ii); } for (int i = 0; i < sizeOfSum; i++) { td thread[i].join(); } auto end = chrono::steady_clock::now(); cout << "Thread time consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl; } cout << endl << "-----Serial-----" << endl; { // Serialblock { vector<long> sum(sizeOfSum, 0); auto start = chrono::steady_clock::now(); for (int i = 0; i < sizeOfSum; i++) { sumUp(numbers, sum, Ii); } auto end = chrono::steady_clock::now(); cout << "Serial time consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl; }
}
What would it take to show the advantages of multicore? As you can see in our program, multiple threads are not operating on the same variable, but only a single thread is reading and writing a lot of variables in the same cache line, so you can introduce a local variable.
We implement another function sumUp2 newSumUp in the program code:  void newSumUP(const vector<int> numbers, vector<long>& sum, int id) { long thisSum = 0; for (int i = 0; i < numbers.size(); ++i) { if (i % sizeOfSum == id) { thisSum += i; } } sum[id] = thisSum; cout << "sum " << id << " = " << sum[id] << endl; } Also, we add the following code to our main function:
Make some more changes to our main function cout << endl << "----new block---" << endl; // new block { vector<long> sum(sizeOfSum, 0); auto start = chrono::steady_clock::now(); vector<thread> thread; for (int i = 0; i < sizeOfSum; ++i) { thread.emplace_back(newSumUP, numbers, ref(sum), i); } for (int i = 0; i < sizeOfSum; ++i) { thread[i].join(); } auto end = chrono::steady_clock::now(); cout << "New block consumption: " << chrono::duration_cast<chrono::microseconds>(end - start).count() << "ms" << endl; }
When we ran the program again, we came to this conclusion:
We can see that the new block is already much faster than the Thread block, and even comparable to the Serial block is almost the same. But overall the Serial block is still the least time-consuming because the new block still needs to incur extra overhead for thread creation and scheduling.
118
edits

Navigation menu