Difference between revisions of "ParaCode"
(→Assignment 1) |
(→Resources) |
||
(76 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{GPU610/DPS915 Index | 20171}} | {{GPU610/DPS915 Index | 20171}} | ||
− | + | ||
== Team Members == | == Team Members == | ||
# [mailto:qchen102@myseneca.ca?subject=gpu610 Qiliang Chen] | # [mailto:qchen102@myseneca.ca?subject=gpu610 Qiliang Chen] | ||
− | # [mailto:xdtruong@myseneca.ca@myseneca.ca?subject=gpu610 | + | # [mailto:xdtruong@myseneca.ca?subject=gpu610 Xuan Dinh Truong] |
+ | |||
+ | [mailto:qchen102@myseneca.ca;xdtruong@myseneca.ca?subject=gpu610 Email All] | ||
== Progress == | == Progress == | ||
=== Assignment 1 === | === Assignment 1 === | ||
− | ==== | + | ==== Project: LZW Compress ==== |
+ | |||
+ | ''' Introduction: ''' | ||
+ | |||
+ | LZW stand for "Lempel–Ziv–Welch". It is a universal lossless data compression algorithm which is widely use in compressing software. It is widely used in image compress and Unix file compress. | ||
+ | Here is a project example of LZW compressing written by c++. Here's the link for more details: https://codereview.stackexchange.com/questions/86543/simple-lzw-compression-algorithm | ||
+ | |||
+ | ''' Basic Compile Command:(Linux) ''' | ||
+ | g++ -std=c++0x lzw.c -o lzw | ||
+ | |||
+ | ''' To Execute Projet ''' | ||
+ | * Compress file: | ||
+ | ./lzw -c file.txt | ||
+ | |||
+ | * Decopress file: | ||
+ | ./lzw -d file.txt | ||
+ | |||
+ | |||
+ | |||
+ | ''' Profile ''' | ||
+ | |||
+ | We have compress and profile for 3 different size files. The runtime for each is shown below:<br> | ||
+ | [[File:Screen Shot 2018-02-23 at 12.55.32 AM.png|400px]] | ||
+ | |||
+ | To see runtime for each function, see below:<br> | ||
+ | [[File:Screen Shot 2018-02-23 at 1.02.40 AM.png|650px]] | ||
+ | |||
+ | Here is graphic:<br> | ||
+ | [[File:Screen Shot 2018-02-23 at 12.42.51 AM.png|600px]] | ||
+ | |||
+ | Float profile:<br> | ||
+ | |||
+ | Here is the float profile sample for compressing file which has size 54.3MB, float profile is changed to .txt and is simplified)<br> | ||
+ | [[File:Screen Shot 2018-02-23 at 1.20.32 AM.png|600px]] | ||
+ | |||
+ | ==== Project: Grep - Count Keyword in File ==== | ||
+ | |||
+ | ''' Introduction: ''' | ||
+ | |||
+ | On the Linux system, we are very familiar with the Grep command. This is a similar project to that. But this one is more simple. It just counts how many times the keyword appears in the file. After compile the program, we run it as a command like this:<br> | ||
+ | [[File:Screen_Shot_2018-04-03_at_10.00.32_PM.png|100px]] | ||
+ | |||
+ | Here, it counts how many 'is' are there in file 'sample.txt'. | ||
+ | |||
+ | ''' Profile: ''' | ||
+ | We've make three different sizes of files to profile the program. The runtime of them show below:<br> | ||
+ | [[File:Screen_Shot_2018-04-03_at_10.11.02_PM.png|300px]] | ||
+ | |||
+ | To see runtime for functions, see below:<br> | ||
+ | [[File:Screen Shot 2018-04-03 at 10.27.19 PM.png|500px]] | ||
+ | |||
+ | See graphic:<br> | ||
+ | [[File:Screen Shot 2018-04-03 at 10.26.29 PM.png|500px]] | ||
+ | |||
+ | Profile via Visual Studio Sample: | ||
+ | [[File:NoPara 4023k.JPG|650px]] | ||
=== Assignment 2 === | === Assignment 2 === | ||
+ | |||
+ | We are going to work on 'Grep' project.<br> | ||
+ | |||
+ | Through the profile, we can see that the most cost of the runtime is 'grep()' and 'convert_char_to_string()' functions. <br> | ||
+ | |||
+ | |||
+ | ''' Basic Variables ''' | ||
+ | |||
+ | Let's go and see the basic variables in code:<br> | ||
+ | [[File:BasicVar.JPG|650px]] | ||
+ | |||
+ | First, I look at function 'convert_char_to_string()'. I think it's no necessary to use this function, so I remove it.<br> | ||
+ | [[File:StringremoveJPG.JPG|650px]] | ||
+ | |||
+ | |||
+ | '''Serial grep method''' | ||
+ | |||
+ | Then, we start working in grep function. As file size growing up, which means more characters in file, the runtime of 'grep()' grows much more faster than the other functions. So we are going to work on 'grep()' function.<br> | ||
+ | |||
+ | Let's going through grep() function:<br> | ||
+ | [[File:Screen Shot 2018-04-03 at 10.42.41 PM.png|650px]] | ||
+ | |||
+ | We can see, there're two for loop in grep() function. In loop is inside another. Which mean it's kind of O(n^m) runtime function. That's why it cost a lot of time to run.<br> | ||
+ | |||
+ | The idea of grep is very simple. The inside loop make a temporary string from input string with the length of keyword length. Then it compare the temp string to keyword. If they equal, means this temp string is the same as keyword string. It increases the counter by 1. <br> | ||
+ | [[File:GrepIdea.JPG|650px]] | ||
+ | |||
+ | Simple? But to parallel it is not that simple...<br> | ||
+ | |||
+ | |||
+ | '''Parallel grep idea''' | ||
+ | |||
+ | If we put the character array to CUDA kernel to run, those characters will run individually in different threads. It will look like this:<br> | ||
+ | [[File:LineInThreads.JPG|650px]] | ||
+ | |||
+ | Also, if we put keyword character array in kernel, the characters will run in individual thread as well:<br> | ||
+ | [[File:Keyword.JPG|650px]] | ||
+ | |||
+ | It's impossible to compare the string keyword to the file input string cause all characters are individual in threads. That make us spend a huge time to figure out the solution.<br> | ||
+ | |||
+ | Although we cannot compare a whole string, it is possible to compare by one character. For example, it is possible to find just 'h' in input string. <br> | ||
+ | [[File:FindH.JPG|650px]] | ||
+ | |||
+ | With the same idea, we can find 'a':<br> | ||
+ | [[File:FindA.JPG|650px]] | ||
+ | |||
+ | What's next?? | ||
+ | |||
+ | We create an integer array. The size of this array is the same as input's size. This array is to record the characters exist in that position. If characcter exist, we mark it as specific number. Wecall this array as band.<br> | ||
+ | |||
+ | First, we initialize all integers as '0'. This means all characters in keyword are not found in that position in input.<br> | ||
+ | [[File:Initial.JPG|650px]] | ||
+ | |||
+ | Then, we start to find keyword's characters in input string. For example, we want to find 'h'. As 'h' is the first character in keyword, we will mark it '1' in the integer array 'band'. Like this:<br> | ||
+ | [[File:Mark1.JPG|650px]] | ||
+ | |||
+ | With the same idea, we find out the second character 'a'. As 'a' is the second character in keyword, we mark it '2'.<br> | ||
+ | [[File:Mark2.JPG|650px]] | ||
+ | |||
+ | After all characters in keyword are found, we will count how many numbers are connected in sequence.<br> | ||
+ | [[File:NumInSequence.JPG|650px]] | ||
+ | |||
+ | This is our main idea to parallel the grep function. Yes, it still have some loops. But those loops are not as large as that in serial grep function. So we start to design the code foe it.<br> | ||
+ | |||
+ | |||
+ | ''' Parallel grep code ''' | ||
+ | |||
+ | Here are part of parallel grep code:<br> | ||
+ | |||
+ | Main grep parallel:<br> | ||
+ | [[File:ParaGrep.JPG|650px]] | ||
+ | |||
+ | Initial integer arrar kernel:<br> | ||
+ | [[File:InitialBandFunc.JPG|650px]] | ||
+ | |||
+ | Grep character kernel:<br> | ||
+ | [[File:GrepKernel.JPG|650px]] | ||
+ | |||
+ | Two grep:<br> | ||
+ | [[File:GrepCompare.JPG|650px]] | ||
+ | |||
+ | Run and result:<br> | ||
+ | [[File:Result22.JPG|650px]] | ||
+ | |||
+ | Yes. The grep of parallel's result is the same as serial's result! It works!<br> | ||
+ | |||
+ | ''' Profile ''' | ||
+ | |||
+ | Sample of profile:<br> | ||
+ | [[File:ProfilePara.JPG|650px]] | ||
+ | |||
+ | Run and record time data for serial program and parallel program. We got the time for each:<br> | ||
+ | [[File:TimeCompareTwoPart.JPG|650px]] | ||
+ | |||
+ | Generate the graphics:<br> | ||
+ | Total time compare:<br> | ||
+ | [[File:TotalCompare.JPG|650px]] | ||
+ | |||
+ | Time compare for function grep() and grepPara():<br> | ||
+ | [[File:TimecompareFunction.JPG|650px]] | ||
+ | |||
+ | We got it!<br> | ||
+ | |||
=== Assignment 3 === | === Assignment 3 === | ||
+ | |||
+ | Next, we optimize the code so that it runs faster.<br> | ||
+ | |||
+ | ''' Use shared memory ''' | ||
+ | |||
+ | We change global memory to shared memory in kernel grepx(). The code is:<br> | ||
+ | [[File:SharedMem.JPG|650px]] | ||
+ | |||
+ | |||
+ | ''' Profile ''' | ||
+ | |||
+ | We get the data like this:<br> | ||
+ | [[File:SharedData.JPG|650px]] | ||
+ | |||
+ | |||
+ | ''' ntpb '' | ||
+ | |||
+ | We also try different ntpb and see if it effect runtime.<br> | ||
+ | [[File:NtpbDiff.JPG|650px]] | ||
+ | |||
+ | According to the data, the number of ntpb is not an important factor affecting time. | ||
+ | |||
+ | ''' End ''' | ||
+ | |||
+ | That's all for our assignment. | ||
+ | |||
+ | |||
+ | == Resources == | ||
+ | |||
+ | You can see my code here: https://gist.github.com/KignorChan/764c71d9d96561e4101d9830e0bc5e69 | ||
+ | |||
+ | Idea from LZW compress program: https://codereview.stackexchange.com/questions/86543/simple-lzw-compression-algorithm |
Latest revision as of 16:05, 8 April 2018
GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary
Contents
Team Members
Progress
Assignment 1
Project: LZW Compress
Introduction:
LZW stand for "Lempel–Ziv–Welch". It is a universal lossless data compression algorithm which is widely use in compressing software. It is widely used in image compress and Unix file compress. Here is a project example of LZW compressing written by c++. Here's the link for more details: https://codereview.stackexchange.com/questions/86543/simple-lzw-compression-algorithm
Basic Compile Command:(Linux)
g++ -std=c++0x lzw.c -o lzw
To Execute Projet
- Compress file:
./lzw -c file.txt
- Decopress file:
./lzw -d file.txt
Profile
We have compress and profile for 3 different size files. The runtime for each is shown below:
To see runtime for each function, see below:
Float profile:
Here is the float profile sample for compressing file which has size 54.3MB, float profile is changed to .txt and is simplified)
Project: Grep - Count Keyword in File
Introduction:
On the Linux system, we are very familiar with the Grep command. This is a similar project to that. But this one is more simple. It just counts how many times the keyword appears in the file. After compile the program, we run it as a command like this:
Here, it counts how many 'is' are there in file 'sample.txt'.
Profile:
We've make three different sizes of files to profile the program. The runtime of them show below:
To see runtime for functions, see below:
Profile via Visual Studio Sample:
Assignment 2
We are going to work on 'Grep' project.
Through the profile, we can see that the most cost of the runtime is 'grep()' and 'convert_char_to_string()' functions.
Basic Variables
Let's go and see the basic variables in code:
First, I look at function 'convert_char_to_string()'. I think it's no necessary to use this function, so I remove it.
Serial grep method
Then, we start working in grep function. As file size growing up, which means more characters in file, the runtime of 'grep()' grows much more faster than the other functions. So we are going to work on 'grep()' function.
Let's going through grep() function:
We can see, there're two for loop in grep() function. In loop is inside another. Which mean it's kind of O(n^m) runtime function. That's why it cost a lot of time to run.
The idea of grep is very simple. The inside loop make a temporary string from input string with the length of keyword length. Then it compare the temp string to keyword. If they equal, means this temp string is the same as keyword string. It increases the counter by 1.
Simple? But to parallel it is not that simple...
Parallel grep idea
If we put the character array to CUDA kernel to run, those characters will run individually in different threads. It will look like this:
Also, if we put keyword character array in kernel, the characters will run in individual thread as well:
It's impossible to compare the string keyword to the file input string cause all characters are individual in threads. That make us spend a huge time to figure out the solution.
Although we cannot compare a whole string, it is possible to compare by one character. For example, it is possible to find just 'h' in input string.
With the same idea, we can find 'a':
What's next??
We create an integer array. The size of this array is the same as input's size. This array is to record the characters exist in that position. If characcter exist, we mark it as specific number. Wecall this array as band.
First, we initialize all integers as '0'. This means all characters in keyword are not found in that position in input.
Then, we start to find keyword's characters in input string. For example, we want to find 'h'. As 'h' is the first character in keyword, we will mark it '1' in the integer array 'band'. Like this:
With the same idea, we find out the second character 'a'. As 'a' is the second character in keyword, we mark it '2'.
After all characters in keyword are found, we will count how many numbers are connected in sequence.
This is our main idea to parallel the grep function. Yes, it still have some loops. But those loops are not as large as that in serial grep function. So we start to design the code foe it.
Parallel grep code
Here are part of parallel grep code:
Yes. The grep of parallel's result is the same as serial's result! It works!
Profile
Run and record time data for serial program and parallel program. We got the time for each:
Generate the graphics:
Total time compare:
Time compare for function grep() and grepPara():
We got it!
Assignment 3
Next, we optimize the code so that it runs faster.
Use shared memory
We change global memory to shared memory in kernel grepx(). The code is:
Profile
' ntpb
We also try different ntpb and see if it effect runtime.
According to the data, the number of ntpb is not an important factor affecting time.
End
That's all for our assignment.
Resources
You can see my code here: https://gist.github.com/KignorChan/764c71d9d96561e4101d9830e0bc5e69
Idea from LZW compress program: https://codereview.stackexchange.com/questions/86543/simple-lzw-compression-algorithm