Changes

← Older edit

GPU610 Team Tsubame

4,514 bytes removed, 14:54, 12 April 2017

→‎Attempt 1:

[[File:Out1.png|thumb|Maze 10x10]]

* '''Introduction:'''

The Maze program (by corzani@github) generates a maze image (of type PNG).

The following function creates the canvas of the Maze and initialize it by filling all the hexadecimal within each pixel. After the canvas is filled, the function calls the createImage function. Finally, the function write the result from the createImage function in a png file using the "lpng" extension.

void MazePng::toPng(unsigned int scale) {

12. In VS, go to menu Build -> Build Solution

*Kernel: *Sum up'''Analysis: '''

[[File:SPDiagram.PNG]]

The diagram shows that maze less than the size of 2,250,000 cells perform better in the serial code. However~~, if the there are more cells~~, the parallelized code has better performanceif there are more cells to process.

== PHASE 3 ==

* Optimize attempt=== Attempt 1 ===

We ~~have~~ tried to optimize the maze program~~. However, the result~~ by removing some of ~~optimization did not improve~~ the ~~speed of~~ if statements to reduce thread divergence, however the ~~program. In fact, it~~ attempt made the program slower than the ~~parallelization but~~ phase 2 version and only a little bit faster than the serialversion.

~~Instead, we decided to use share memory in the GPU~~ '''New Kernels:''' // Initialize all pixels to ~~improve the speed of the program. However, the maze image does not showing correctly.~~ * New Kernelblack (hex 000)

__global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) {

int i = blockIdx.x * blockDim.x + threadIdx.x;

}

// Set pixels to white according to the pattern the cell belongs to

__global__ void k_drawPaths(png_byte* rows, const short* cells, const int width, const int height, const int len) {

int i = blockIdx.x * blockDim.x + threadIdx.x;

}

~~Because there were too many else if condition in the old Kernel, we rewrite the Kernel to avoid divergent.~~'''Analysis:'''

[[File:SPODiagram.PNG]]

*Sum up: **Still too many if statement in the Kernel**Almost the same speed as the serial ~~== Presentation ==~~ ~~'''1. Introduction'''~~* Maze~~The program generates a maze in png file. It takes 2 arguments: the height and the width of the maze.~~ *Original Code: /* * MazePng.cpp * * Created on: 12 Jul 2013 * Author: yac */ ~~#include "MazePng.h"~~ ~~#include <stdio.h>~~ ~~#include <stdlib.h>~~ ~~#include <string.h>~~ ~~const png_byte WALL = 0x00;~~ ~~const png_byte PATH = 0xFF;~~ ~~MazePng::MazePng(const unsigned int width, const unsigned int height) :~~ ~~AbstractMaze(width, height) {~~ } ~~MazePng::~MazePng() { }~~ ~~void MazePng::toPng(unsigned int scale) {~~ ~~png_byte color_type;~~ ~~png_byte bit_depth;~~ ~~png_structp png_ptr;~~ ~~png_infop info_ptr;~~ ~~png_bytep *row_pointers;~~ ~~int height, width;~~ ~~FILE *fp;~~ ~~width = (~~For this~~->width * 2) + 1;~~ ~~height = (this->height * 2) + 1;~~ ~~color_type = PNG_COLOR_TYPE_RGB;~~ ~~bit_depth = 8;~~ ~~row_pointers = new png_bytep[height];~~ ~~for (int i = 0; i < height; ++i) {~~ ~~row_pointers[i] = new png_byte[width * 3];~~ ~~for (int j = 0; j < width * 3; j += 3) {~~ ~~row_pointers[i][j] = WALL;~~ ~~row_pointers[i][j + 1] = WALL;~~ ~~row_pointers[i][j + 2] = WALL;~~ } } ~~createImage(row_pointers~~attempt, ~~0);~~ ~~fp = fopen("out.png", "wb");~~ ~~png_ptr = png_create_write_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL);~~ ~~info_ptr = png_create_info_struct(png_ptr);~~ ~~png_init_io(png_ptr, fp);~~ ~~png_set_IHDR(png_ptr, info_ptr, width, height, bit_depth, color_type,~~ ~~PNG_INTERLACE_NONE, PNG_COMPRESSION_TYPE_BASE,~~ ~~PNG_FILTER_TYPE_BASE);~~ ~~png_write_info(png_ptr, info_ptr);~~ ~~png_write_image(png_ptr, row_pointers);~~ ~~png_write_end(png_ptr, NULL);~~ ~~png_destroy_write_struct(&png_ptr, &info_ptr);~~ the kernel executes for ~~(int i = 0; i < height; ++i) {~~ ~~delete[] row_pointers[i];~~ } ~~delete[] row_pointers;~~ ~~fclose(fp);~~ } ~~void inline MazePng::setPixel(png_bytep *row_pointers, unsigned int x, unsigned int y,~~ ~~png_byte type) {~~ ~~row_pointers[y][0 + 3 * x] = type;~~ ~~row_pointers[y][1 + 3 * x] = type;~~ ~~row_pointers[y][2 + 3 * x] = type;~~ } ~~void MazePng::createImage(png_bytep *row_pointers, unsigned int scale) {~~ /* each cell instead of for each byte (~~unsigned int x = 0; x < ((width * 2) + 1); ++x) {~~ ~~setPixel(row_pointers, x, 0, WALL);~~ } */ ~~if (start < width) {~~ ~~setPixel(row_pointers, start * 2 + 1, 0, PATH);~~ } ~~png_byte temp[2] = { 0 };~~ ~~for (unsigned int y = 0; y < height; ++y) {~~ ~~setPixel(row_pointers, 0, (y * 2) + 1, WALL);~~ ~~for (unsigned int x = 0; x < width; ++x) {~~ ~~switch ((cells[(y * width) + x] & 0xC0) >> 6) {~~ ~~case 2:~~ ~~temp[0] = PATH;~~ ~~setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, temp[0]);~~ ~~break;~~ ~~case 1:~~ ~~temp[1] = PATH;~~ ~~setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, temp[1]);~~ ~~break;~~ ~~case 0:~~ ~~setPixel(row_pointers, 2 + (x * 2), (y * 2) + 1, temp[0]);~~ ~~setPixel(row_pointers, 1 + (x * 2), (y * 2) + 2, temp[1]);~~ ~~break;~~ } ~~setPixel(row_pointers, 1 + (x * 2), (y * 2) + 1, PATH);~~ } } } * Analysis: ~~'''Parallelizable?'''~~ ~~The program function named toPng() has a runtime of O(n^~~in phase 2); it ~~takes up an average of 42% (min: 35%; max: 50%) of the program's execution time.~~ ~~for (int i = 0; i < height; ++i) {~~ ~~row_pointers[i] = new png_byte[width * 3];~~ ~~for (int j = 0; j < width * 3; j += 3) {~~ ~~row_pointers[i][j] = WALL;~~ ~~row_pointers[i][j + 1] = WALL;~~ ~~row_pointers[i][j + 2] = WALL;~~ } } ~~'''2. Parallelize'''~~ ~~'''3. Optimize'''~~ * New Kernel ~~__global__ void k_drawWalls(png_byte* rows, const short* cells, const int width, const int height, const int len, const int size) {~~ ~~int i = blockIdx.x * blockDim.x + threadIdx.x;~~ ~~if (i < size) {~~ ~~rows[i] = WALL;~~ ~~__syncthreads();~~ } } ~~__global__ void k_drawPaths(png_byte* rows, const short* cells, const int width, const int height, const int len) {~~ ~~int i = blockIdx.x * blockDim.x + threadIdx.x;~~ ~~if (i < width * height) {~~ ~~int x = i % width;~~ ~~int y = i / width;~~ ~~int cas = (cells[i] & 0xC0) >> 6;~~ ~~int oddY = (2 * y + 1) * len;~~ ~~int evenY = (2 * y + 2) * len;~~ ~~int oddX = (2 * x + 1) * 3;~~ ~~int evenX = (2 * x + 2) * 3;~~ ~~if (cas == 2) {~~ ~~rows[oddY + evenX] = rows[oddY + evenX + 1] = rows[oddY + evenX + 2] = PATH;~~ } ~~else if (cas == 1) {~~ ~~rows[evenY + oddX] = rows[evenY + oddX + 1] = rows[evenY + oddX + 2] = PATH;~~ } ~~else if (cas == 0) {~~ ~~rows[oddY + evenX] = rows[oddY + evenX +~~ is no longer processing more than 1~~] = rows[oddY + evenX + 2] = PATH;~~ ~~rows[evenY + oddX] = rows[evenY + oddX + 1] = rows[evenY + oddX + 2] = PATH;~~ } ~~rows[oddY + oddX] = rows[oddY + oddX + 1] = rows[oddY + oddX + 2] = PATH;~~ } } ~~Because there were too many else if condition~~ pixel in ~~the old Kernel~~each cell for every thread, ~~we rewrite~~ which may be the ~~Kernel~~ cause to ~~avoid divergent~~the longer processing time.

~~[[File:SPODiagram.PNG]]~~=== Attempt 2 ===

*Sum up: **Still too many if statement We decided to use shared memory in the ~~Kernel~~**Almost GPU to improve the speed of the program. However, the maze image does not show correctly: the ~~same section~~ paths were showing as randomly coloured pixels; the cause is due to the ~~serial~~threads setting only a part (1/3, 2/3) of a pixel's values to hexadecimal F.

Yanhao Lei

240

edits

Changes

GPU610 Team Tsubame

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools