46
edits
Changes
→Presentation
* Analysis:
'''Parallelizable?'''
The program function named toPng() has a runtime of O(n^2) takes up an average of 42% (min: 35%; max: 50%) of the execution time.
[[File:SPDiagram.PNG]]
*Sum up:
**Maze size < 1500 * 1500
**Maze size > 1500 * 1500
'''3. Optimize'''
* New Kernel
__global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < size) {
rows[i] = WALL;
__syncthreads();
}
}
__global__ void k_drawPaths(png_byte* rows, const short* cells, const int width, const int height, const int len) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < width * height) {
int x = i % width;
int y = i / width;
int cas = (cells[i] & 0xC0) >> 6;
int oddY = (2 * y + 1) * len;
int evenY = (2 * y + 2) * len;
int oddX = (2 * x + 1) * 3;
int evenX = (2 * x + 2) * 3;
if (cas == 2) {
rows[oddY + evenX] = rows[oddY + evenX + 1] = rows[oddY + evenX + 2] = PATH;
}
else if (cas == 1) {
rows[evenY + oddX] = rows[evenY + oddX + 1] = rows[evenY + oddX + 2] = PATH;
}
else if (cas == 0) {
rows[oddY + evenX] = rows[oddY + evenX + 1] = rows[oddY + evenX + 2] = PATH;
rows[evenY + oddX] = rows[evenY + oddX + 1] = rows[evenY + oddX + 2] = PATH;
}
rows[oddY + oddX] = rows[oddY + oddX + 1] = rows[oddY + oddX + 2] = PATH;
}
}
Because there was too many else if condition in the old Kernel, we rewrite the Kernel to avoid divergent.
[[File:SPODiagram.PNG]]
*Sum up:
**Still too many if statement in the Kernel
**Almost the same section as the serial