0.00 3.84 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN9MazeDebugC2Ejj
0.00 3.84 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
*Profiling with windows environment:
[[File:SDiagram.PNG]]
'''Summary:'''
12. In VS, go to menu Build -> Build Solution
* Maze image result
[[File:Out2.png|thumb|Maze 3x3]]
[[File:Out1.png|thumb|Maze 10x10]]
*Kernel:
*Sum up:
[[File:SPDiagram.PNG]]
The diagram shows that maze less than the size of 2,250,000 cells perform better in the serial code. However, if the there are more cells, the parallelized code has better performance.
== PHASE 3 ==
* Maze
The program generates a maze in png file. It takes 2 arguments: the height and the width of the maze.
* Maze image
[[File:Out2.png|thumb|Maze 3x3]]
[[File:Out1.png|thumb|Maze 10x10]]
*Original Code:
}
}
* Profiling:
[[File:SDiagram.PNG]]
'''2. Parallelize'''
*Kernel:
__global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < size) {
rows[i] = WALL;
__syncthreads();
int px = i % len;
int py = i / len;
int x = (px - 1) / 2;
int y = (py - 1) / 2;
if (px > 0 && py > 0 && x < width && y < height) {
int c = (cells[y * width + x] & 0xC0) >> 6;
int idx = py * len + 3 * px;
if (c == 2) {
if (py % 2 > 0 && px % 2 == 0) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
else if (c == 1) {
if (py % 2 == 0 && px % 2 > 0) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
else if (c == 0) {
if ((py % 2 > 0 && px % 2 == 0) || (py % 2 == 0 && px % 2 > 0)) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
if (py % 2 > 0 && px % 2 > 0) {
rows[idx] = rows[idx + 1] = rows[idx + 2] = PATH;
}
}
}
}
*Profiling:
[[File:SPDiagram.PNG]]
*Sum up:
**Maze size < 1500 * 1500
**Maze size > 1500 * 1500
'''3. Optimize'''