Changes

← Older edit

TudyBert

2,398 bytes added, 14:29, 19 April 2013

→‎Assignment 3

}

While I did see a decrease in the time taken to run 50 loops, the decrease wasn't as significant as I had hoped. Obviously this kernel isn't optimized so I'm looking forward to some more impressive results as I update the code.

=== Assignment 3 ===

After making sure memory access is coalesced and replacing the second counter loop with threads from a 2 dimensional block of 2 dimensional threads, I've achieved significant speed ups in the program. All it took was launching the kernel with an optimized 2D array of blocks each containing a 2D array of threads. For assignment 2 I had a grid with 1 thread for each column in the image. That meant each thread was running 3 nested for loops to do the necessary calculations for enlarging. Figuring out the math for calculating the correct index in the arrays proved to be tricky. Although I knew exactly what to do in concept, the two extra nested for loops threw me off. For a long time the image was being enlarged correctly but the physical dimensions of the image weren't increasing. Once I had that figured out the image was enlarging but not to the new dimensions. After some tracing and trial and error I managed to find the right formula to calculate the indices.

Here's the final, optimized enlarge method:

int jdx = blockIdx.x * blockDim.x + threadIdx.x;

int idx = blockIdx.y * blockDim.y + threadIdx.y;

int k = idx + jdx * blockDim.x * gridDim.x;

int enlargeRow, enlargeCol;

__shared__ int pixel;

pixel = work[k];

enlargeRow = idx * factor;

enlargeCol = jdx * factor;

__syncthreads();

for(int c = enlargeRow; c < (enlargeRow + factor); c++)

{

for(int d = enlargeCol; d < (enlargeCol + factor); d++)

{

result[c + d * blockDim.x * gridDim.x * factor] = pixel;

__syncthreads();

}

I enjoyed parallelizing this program and really wish I could have figured out the CERN project. To make myself feel better I also parallelized the rotate image method.

While I did see a decrease in the time taken to run 50 loops, the decrease wasn't as significant as I had hoped. Obviously this kernel isn't optimized so I'm looking forward to some more impressive results as I update the code.

~~=== Assignment 3 ===~~I was going to paste the code snippet here but I'm getting frustrated with the formatting. Why is it so difficult to nicely format code on a Wiki? [http://pastebin.com/ZZV9KRJN Here] it is.

Rwstanica

1

edit

Changes

TudyBert

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools