Changes

TudyBert

1,283 bytes added, 11:53, 19 April 2013

→‎Drive_God

The chunk of the processing is wasted on copying the two arrays over from one image to another. If I have time I might look into parallelizing this as well. It would be interesting to see if the speed of the GPU can overcome the overhead of copying to and from the device.

=== Assignment 2 ===

For Assignment 2 I simply put the four for loops into a kernel and replaced the outermost loop with thread indices. I made a helper method that set up memory on the device and launched the kernel with a 1 dimension array of blocks each containing 1 thread. I launched as many blocks of 1 thread as there were rows in the image file. I figured this was the quickest way to get this method parallelized. Unfortunately I hit a wall with my data sizes. The CPU version of the enlarge image method fails when run for more than 50 loops. The error thrown is a Visual Studio debugging error so I'm think VS isn't too happy with having the CPU hogged for so long. As a result I've had to extrapolate times for larger loops by assuming a linear increase in time taken.

Here's the code for newly parallelized method:

int idx = blockIdx.x * blockDim.x + threadIdx.x;

int enlargeRow, enlargeCol;

__shared__ int pixel;

for(int j = 0; j < nj; j++)

{

pixel = work[idx * nj + j];

enlargeRow = idx * factor;

enlargeCol = j * factor;

for(int c = enlargeRow; c < (enlargeRow + factor); c++)

{

for(int d = enlargeCol; d < (enlargeCol + factor); d++)

{

result[d + c * blockDim.x * gridDim.x * factor] = pixel;

}

=== Assignment 3 ===

Rwstanica

1

edit

CDOT Wiki β

Changes

TudyBert

CDOT Wiki ^β