Changes

Jump to: navigation, search

GPU621/Intel Parallel Studio Inspector

10,929 bytes added, 21:25, 20 November 2020
no edit summary
= Intel Parallel Studio Inspector =
 
=== Description ===
The purpose of this project is to provide a functional overview of the Intel Inspector, which is a correctness checking program that detects and locates threading errors (deadlocks and data races) and memory errors (memory leaks and illegal memory accesses) of an application. In this project, the functional components and the graphical user interface of the Intel Inspector are demonstrated by use case examples. The successful delivery of this project concludes that how to utilize this tool from Intel to improve the accuracy and efficiency when developing memory and computation-intensive application.
 
= Features and Functionalities =
==Features==
Intel Inspector provides developers a way to secure their program by detecting and locating memory and threading errors. When a program is large and the logic within it is complicated, the memory and threading bugs become difficult to locate. This is particularly true when developing programs that need to be optimized using multi-threading approaches. Intel Inspector offers parallelization model support, which includes the support to:
 
* '''OpenMP'''
* '''TBB'''
* Parallel language extensions for the Intel C++ Compiler
* Microsoft PPL
* Win32 and POSIX threads
* '''Intel MPI Library'''
 
Besides, Intel Inspector also supports various languages (C, C++, and Fortran), operating systems (Windows and Linux), IDEs (Visual Studio, Eclipse, etc.), and compilers (Intel C++, Intel Fortran, Visual C++, GCC, etc.). These all together make Intel Inspector a convenient and efficient tool in helping developers build and test complicated programs and HPCs more easily.
 
 
==Functionalilites==
In terms of functionalities, the Intel Inspector four different debuggers: Correctness Analyzer & Debugger, Memory Debugger, Threading Debugger, and Persistent Memory Debugger. In the scope of this course and project, we will focus on the first three debuggers.
 
===Correctness Analyzer & Debugger===
Normally a debugging process inserts a breakpoint right at the location where an error occurs. However, it is sometimes hard to find out what exactly the problem is because that location may have been executed hundreds of times. The Correctness Analyzer & Debugger makes the Intel Inspector work on the code without special recompiles for analysis. More, it makes the diagnosis faster by inserting breakpoints just before the error occurs into the debugger so that we know where and when the error occurs.
 
===Memory Debugger===
The memory problem is a big headache in programming. The Memory Debugger in Intel Inspector detects and locates the memory error location, as well as providing a graphical tool to show memory growth, locate the call stack and code where the memory growth is produced.
 
===Threading Debugger===
In a program, threading problems are very hard to detect and locate since they are usually considered 'errors' in the program logic. The reason for this is that threading problems are often non-deterministic problems such as race conditions and deadlock. These kinds of problems do not happen in every run of the program and even they happen, the program runs as usual but generates wrong outputs. The Threading Debugger inside Intel Inspector works as an efficient diagnosis tool against threading errors in the program even if the program does not encounter the error. This debugger is especially helpful when building HPC applications and optimizing codes using multi-threading algorithms.
 
When using Intel Inspector for analysis, it is important to have a proper balance between analysis deepness and memory overload.
 
= How to use =
It is extremely simple to use Intel Inspector. The Intel Inspector can work as a stand-alone application or as an insider function of the IDE. Here we use Microsoft Visual Studio as an example.
 
When we are inside the source code of a program in Visual Studio, build your program, simply click on the dropdowns besides the Intel Inspector icon, select "New Analysis"
 
[[File:Dropdowns.jpg|1100px]]
 
Now we are inside the analysis panel, select the analysis type, deepness of analysis, and extra options, then press start to launch analysis
 
[[File:GUIpanel.jpg|1100px]]
 
Now the Intel Inspector runs the code and trying to debug. The debug progress is shown in the collection log. When the analysis is complete, click on the "Summary" tag and the error type and location the error is shown in the panels respectively.
 
[[File:CollectionLog.jpg|1100px]]
 
[[File:AnalysisSummary.jpg|1100px]]
 
 
= Memory problems =
 
===Memory Leak===
In order to test the memory leak diagnosis, the following code snippet is used as the error code.
 
<syntaxhighlight lang="cpp" line='line'>
int main()
{
int* c;
c = new int(5);
std::cout << *c << std::endl;
 
return 0;
}
</syntaxhighlight>
 
As we can see the variable 'c' is assigned a heap resource but never deallocate. We run this program in Intel Inspector
 
[[File:MemoryLeak.png|1100px]]
 
The inspection result shows where the leak resource comes from and its location in the code.
 
 
===Invalid Memory Access===
A special program is used as an example in this section. The inspection on the TBB parallel_for workshop perfectly demonstrates the compatibility of Intel Inspector towards Threading Building Blocks algorithm.
 
<syntaxhighlight lang="cpp" line='line'>
#ifndef WORDCOUNT_H_
#define WORDCOUNT_H_
 
#include <tbb/tbb.h>
 
typedef bool (*Delimiter)(char);
 
class WordCount {
const char* string;
int* const stringSize;
int* const numberOfWord;
int number;
Delimiter delimiter;
 
public:
WordCount(const char* str, int* const size, int* const numb, int numChar, const Delimiter del): stringSize(size), numberOfWord(numb){
string = str;
number = numChar;
delimiter = del;
}
 
void operator()(const tbb::blocked_range<int>& r)const {
for (auto i = r.begin(); i != r.end(); i++) {
if (!delimiter(string[i])) {
int s = 0;
while (i + s < number && !delimiter(string[i + s])) s++;
stringSize[i] = s;
int n = 0;
for (int j = i + s + 1; j + s < number; j++) {
bool bad = false;
for (int k = 0;
k < s && k + i < number && k + j < number; k++) {
if (string[i + k] != string[j + k]) {
bad = true;
break;
}
}
if (!bad && delimiter(string[j + s])) n++;
}
numberOfWord[i] = n;
}
else {
stringSize[i] = 0;
numberOfWord[i] = 0;
}
i += stringSize[i]; //may jump and sit on outside of the array but still satisfies
//the loop control clause "i != r.end()"
}
}
};
</syntaxhighlight>
 
Inspection result
 
[[File:TBBinvalidAccess.jpg|1100px]]
 
The Intel Inspector locates the error is from the loop inside the functor used by the tbb::parallel_for() function. All the references of the location being illegally accessing are marked as errors, which indicates the error happens during a specific iteration of the loop. However, this inspection has a extremely high memory overhead which makes the analysis time thousand times longer than the normal run.
 
 
===Memory Growth===
 
 
= Thread problems =
 
===Race Condition===
The following program is used to demonstrate the race condition detection in Intel Inspector. In this program, 5 threads are competing to update the 'wallet' object without a lock. The compiler does not see competition as an error and the program always runs successfully. However, the race condition makes the program different results (inconsistent output). A data race is hard to locate manually but with Intel Inspector, it is easy and quick.
 
<syntaxhighlight lang="cpp" line='line'>
int main()
#include <iostream>
#include <thread>
#include <vector>
 
class Wallet {
int mMoney;
public:
Wallet() :mMoney(0) {}
int getMoney() {
return mMoney;
}
void addMoney(int money) {
mMoney += money;
}
};
 
int testMultithreadWallet() {
Wallet wallet;
int threadNum = 5;
std::vector<std::thread> threads;
//Create 5 threads and push to the vector
for (int i = 0; i < threadNum; i++) {
threads.push_back(
//Create a thread and run its lamda function
std::thread([&]() -> void {
//Call the addMoney 1000 time to add money to the wallet, add 1 dollar each time
for (int i = 0; i < 1000; i++) {
wallet.addMoney(1);
}
})
);
}
//Join all threads back to main thread
for (int i = 0; i < threadNum; i++) {
threads.at(i).join();
}
return wallet.getMoney();
}
 
 
int main() {
int result = 0;
//Run the testMultithreadWallet function 50 times to get the race condition result
for (int k = 0; k < 50; k++) {
//The result should be 5000, if not, print the error result
if ((result = testMultithreadWallet()) != 5000) {
std::cout << "Error at count = " << k << " Money in Wallet = " << result << std::endl;
}
}
return 0;
}
</syntaxhighlight>
 
Incorrect results generated by the race condition code.
[[File:RaceResult1.jpg|500px]] [[File:RaceResult2.jpg|500px]]
 
Inspection summary by Intel Inspector
[[File:DataRace.jpg|1100px]]
 
Using Intel Inspector, data race is quickly detected and located.
 
 
===Deadlock===
Deadlock is another common error that we encounter when developing multi-threading solutions. The cause of deadlock is one or multiple threads that acquiring resources. Simultaneously, resources that being acquired are locked by other threads that are acquiring resources being locked by the previous threads. The situation causes infinite wait time and the program crashes. Deadlock does not happen in each run of the program, sometimes the program runs successfully, but there is a big chance the program will run into a deadlock.
 
The following program uses the Mutex template to create a deadlock scenario.
 
<syntaxhighlight lang="cpp" line='line'>
#include <iostream>
#include <mutex>
#include <thread>
 
using namespace std;
const int SIZE = 10;
 
mutex Mutex1, Mutex2;
 
void even_thread_print(int i)
{
lock_guard<mutex> g1(Mutex1);
lock_guard<mutex> g2(Mutex2);
cout << " " << i << " ";
}
 
void odd_thread_print(int i)
{
lock_guard<mutex> g2(Mutex2);
lock_guard<mutex> g1(Mutex1);
cout << " " << i << " ";
}
 
 
void print(int n)
{
for (int i = SIZE * (n - 1); i < SIZE * n; i++) {
if (n % 2 == 0) {
even_thread_print(i);
}
else
odd_thread_print(i);
 
}
cout << endl;
cout << "---------------------------------------" << endl;
}
 
int main()
{
thread t1(print, 1); // print 0-9
thread t2(print, 2); // print 10-19
thread t3(print, 3); // print 20-29
thread t4(print, 4); // print 30-39
 
t1.join();
t2.join();
t3.join();
t4.join();
 
return 0;
}
</syntaxhighlight>
 
Program Outputs (Correct output and encounters deadlock)
 
[[File:DeadLockNoIssue.jpg|500px]] [[File:DeadLockWithIssue.jpg|530px]]
 
Intel Inspector result
 
[[File:DeadLockSummary.jpg|1100px]]
 
[[File:LocateDeadLock.jpg|1100px]]
 
With the use of Intel Inspector, the deadlock is quickly detected and located. By tracing the call stack we know which locations in our source code produced the deadlock.
 
= Progress =
Update 1: Sunday, Nov 8, 2020 - Created home page.
 
Update 2: Friday, Nov 13, 2020 - Created features section.
 
Update 3: Saturday, Nov 14, 2020 - Worked on creating and referencing error programs for use case demonstrations.
 
Update 4: Monday, Nov 16, 2020 - Created "how to use" section.
 
Update 5: Tuesday, Nov 17, 2020 - All error codes for the use case scenario are complete.
 
Update 6: Wednesday, Nov 18, 2020 - Created use case sections.
 
Update 7: Friday, Nov 20, 2020 - Minor fixes
119
edits

Navigation menu