Changes

Sirius

1,339 bytes added, 10:27, 9 April 2018

→‎Assignment 2

The main choice that made us decide to continue with Box Blur, as supposed of the Vehicle Detection program, is because it was the only problem where we could create

a kernel for and gain a lot of gain in execution speed. The only way to optimize the Vehicle Detection program was to enable CUDA for the DLIB library, which it is not really in the scope of this assignment.

The kernel had been designed to run with 512 threads in order to ensure that this type of program would be able to run on lower compute capability CUDA supported hardware.

~~ ~~

The grid was designed to accommodate a 4K image that would be processed with 3 colour channels.

~~ ~~

Each thread of the grid would represent a single pixel within the image that is being processed.

~~ ~~

By implementing this kernel, the process time of the blur effect had made a significant improvement, compared to it's serial counter-part as illustrated in the graph below.

~~ ~~

=== Kernel Code ===

output_image[offset * 3 + 2] = static_cast<unsigned char>(output_blue / hits);

}

</syntaxhighlight>

=== Launching the Kernel ===

void filter(const Mat& input, Mat& output, int width, int height, int neighbour)

{

//Calculate total number of bytes of input and output image

const int colorBytes = input.step * input.rows;

const int grayBytes = output.step * output.rows;

unsigned char *d_input, *d_output;

//Allocate device memory

cudaMalloc((void**)&d_input, width*height * 3 * sizeof(unsigned char));

cudaMalloc((void**)&d_output, width*height * 3 * sizeof(unsigned char));

//Copy data from OpenCV input image to device memory

cudaMemcpy(d_input, input.ptr(), width*height * 3 * sizeof(unsigned char), cudaMemcpyHostToDevice);

//cudaMemcpy(d_input, input.ptr(), colorBytes, cudaMemcpyHostToDevice);

dim3 blockDims(512, 1, 1);

//Calculate grid size to cover the whole image

dim3 gridDims((unsigned int)ceil((double)(width*height * 3 / blockDims.x)), 1, 1);

//Launch the color conversion kernel

blur << <gridDims, blockDims >> >(d_input, d_output, input.cols, input.rows, neighbour);

//Synchronize to check for any kernel launch errors

cudaDeviceSynchronize();

//Copy back data from destination device meory to OpenCV output image

cudaMemcpy(output.ptr(), d_output, width*height * 3 * sizeof(unsigned char), cudaMemcpyDeviceToHost);

//Free the device memory

cudaFree(d_input);

cudaFree(d_output);

}

</syntaxhighlight>

Racali

81

edits

Changes

Sirius

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools