Difference between revisions of "GroupNumberUndefined"
Andreybykin (talk | contribs) (→Assignment 2) |
Andreybykin (talk | contribs) (→Assignment 2) |
||
Line 91: | Line 91: | ||
And this is the kernal implementation : | And this is the kernal implementation : | ||
− | __global__ void enlarge(int* pixelArr, const int* oldArray, int ni, int nj){ | + | __global__ void enlarge(int* pixelArr, const int* oldArray, int ni, int nj){ |
int i = blockIdx.x * blockDim.x + threadIdx.x; | int i = blockIdx.x * blockDim.x + threadIdx.x; | ||
int j = blockIdx.y * blockDim.y + threadIdx.y; | int j = blockIdx.y * blockDim.y + threadIdx.y; | ||
Line 101: | Line 101: | ||
pixelArr[(i * (nj*2) + j) * 2 + nj*2 + 1] = pixel; | pixelArr[(i * (nj*2) + j) * 2 + nj*2 + 1] = pixel; | ||
} | } | ||
− | } | + | } |
=== Assignment 3 === | === Assignment 3 === |
Revision as of 16:37, 26 March 2017
GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary
Contents
GroupNumberUndefined Project
Team Members
Email All
Progress
Assignment 1
Image Processing: Andrey's Profile
For profiling I decided to use Image Processing. I found a simple image processing code posted on online forums that is written in C++ and has many ways it can be optimized : http://www.dreamincode.net/forums/topic/76816-image-processing-tutorial/. This code is able to take PGM images and manipulate them. I ran my tests using a file with the size of 24mb ( Only the file size matters in the program run time, since that determines how many pixels we have to modify.) I had to modify the code in order for it to be compatible with real time results. First run was done using no optimization when compiling, and shows the follow results.
This was the run time with 0 optimization enabled.
This was the profiling information, as you can see the time in the function is almost split equally, and any of the functions can be modified for efficiency and parallel programming. For example this is the implementation of rotation function :
I also compiled the program using the -O2 flag so that compiler can perform its own optimization and this was the result :
And the Profiling information :
I think for second part of the assignment I will see which functions can be improved the most and focus on that, but still leaving myself time to work on other functions if necessary.
Maze Solver: Darren's Profile
For my profiling, I decided to profile a maze solver algorithm by Paul Griffiths I found at http://www.paulgriffiths.net/program/c/maze.php. This algorithm will take a maze made in text files and find the solution. Considering the maze provided was fairly small (10x 10), I decided to attach multiple mazes together to form a 6867x101 maze to show a more prominent profile.
The first profiling was done with zero optimization:
The second profiling was done with the -O2 flag compiler optimization:
As you can see, the majority of the cumulative time is done in the "look" function shown here:
While this function is done recursively, I feel like with a simple parallelization, this single function can be made much faster. Although, other parts of the code will also need to be changed to support larger dimension of mazes. With the current supported dimension, the amount of execution time is trivial, and if larger dimensions of maze is taken as input, a segmentation fault will occur.
Assignment 2
For assignment two we will be doing image processing. Choose to do the function enlarge() which basically makes the given image bigger. This is it's current implementation :
void Image::enlargeImage(int value, Image& oldImage) /*enlarges Image and stores it in tempImage, resizes oldImage and stores the
larger image in oldImage*/
{
int rows, cols, gray; int pixel; int enlargeRow, enlargeCol; rows = oldImage.N * value; cols = oldImage.M * value; gray = oldImage.Q; Image tempImage(rows, cols, gray); for(int i = 0; i < oldImage.N; i++) { for(int j = 0; j < oldImage.M; j++) { pixel = oldImage.pixelVal[i][j]; enlargeRow = i * value; enlargeCol = j * value; for(int c = enlargeRow; c < (enlargeRow + value); c++) { for(int d = enlargeCol; d < (enlargeCol + value); d++) { tempImage.pixelVal[c][d] = pixel; } } } } oldImage = tempImage;
}
And this is the kernal implementation :
__global__ void enlarge(int* pixelArr, const int* oldArray, int ni, int nj){
int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; if (i < ni && j < nj) { int pixel = oldArray[i * nj + j]; pixelArr[(i * (nj*2) + j)*2] = pixel; pixelArr[(i * (nj*2) + j) * 2 + 1] = pixel; pixelArr[(i * (nj*2) + j) * 2 + nj*2] = pixel; pixelArr[(i * (nj*2) + j) * 2 + nj*2 + 1] = pixel; }
}