Open main menu

CDOT Wiki β

ParaCode

Revision as of 10:49, 6 April 2018 by Qchen102 (talk | contribs) (Assignment 2)

Team Members

Progress

Assignment 1

Project: LZW Compress

Introduction:

LZW stand for "Lempel–Ziv–Welch". It is a universal lossless data compression algorithm which is widely use in compressing software. It is widely used in image compress and Unix file compress. Here is a project example of LZW compressing written by c++. Here's the link for more details: https://codereview.stackexchange.com/questions/86543/simple-lzw-compression-algorithm

Basic Compile Command:(Linux)

 g++ -std=c++0x lzw.c -o lzw

To Execute Projet

  • Compress file:
 ./lzw -c file.txt
  • Decopress file:
 ./lzw -d file.txt


Profile

We have compress and profile for 3 different size files. The runtime for each is shown below:
 

To see runtime for each function, see below:
 

Here is graphic:
 

Float profile:

Here is the float profile sample for compressing file which has size 54.3MB, float profile is changed to .txt and is simplified)
 

Project: Grep - Count Keyword in File

Introduction:

On the Linux system, we are very familiar with the Grep command. This is a similar project to that. But this one is more simple. It just counts how many times the keyword appears in the file. After compile the program, we run it as a command like this:
 

Here, it counts how many 'is' are there in file 'sample.txt'.

Profile: We've make three different sizes of files to profile the program. The runtime of them show below:
 

To see runtime for functions, see below:
 

See graphic:
 

Profile via Visual Studio Sample:  

Assignment 2

We are going to work on 'Grep' project.

Through the profile, we can see that the most cost of the runtime is 'grep()' and 'convert_char_to_string()' functions.

Basic Variables Let's go and see the basic variables in code:
 

First, I look at function 'convert_char_to_string()'. I think it's no necessary to use this function, so I remove it.
 

Serial grep method Then, we start working in grep function. As file size growing up, which means more characters in file, the runtime of 'grep()' grows much more faster than the other functions. So we are going to work on 'grep()' function.

Let's going through grep() function:
 

We can see, there're two for loop in grep() function. In loop is inside another. Which mean it's kind of O(n^m) runtime function. That's why it cost a lot of time to run.

The idea of grep is very simple. The inside loop make a temporary string from input string with the length of keyword length. Then it compare the temp string to keyword. If they equal, means this temp string is the same as keyword string. It increases the counter by 1.
 

Simple? But to parallel it is not that simple...

Parallel grep idea If we put the character array to CUDA kernel to run, those characters will run individually in different threads. It will look like this:
 

Also, if we put keyword character array in kernel, the characters will run in individual thread as well:
 

It's impossible to compare the string keyword to the file input string cause all characters are individual in threads. That make us spend a huge time to figure out the solution.

Although we cannot compare a whole string, it is possible to compare by one character. For example, it is possible to find just 'h' in input string.
 

Assignment 3