Changes

Jump to: navigation, search

TudyBert

2,398 bytes added, 14:29, 19 April 2013
Assignment 3
}
While I did see a decrease in the time taken to run 50 loops, the decrease wasn't as significant as I had hoped. Obviously this kernel isn't optimized so I'm looking forward to some more impressive results as I update the code.
 
=== Assignment 3 ===
After making sure memory access is coalesced and replacing the second counter loop with threads from a 2 dimensional block of 2 dimensional threads, I've achieved significant speed ups in the program. All it took was launching the kernel with an optimized 2D array of blocks each containing a 2D array of threads. For assignment 2 I had a grid with 1 thread for each column in the image. That meant each thread was running 3 nested for loops to do the necessary calculations for enlarging. Figuring out the math for calculating the correct index in the arrays proved to be tricky. Although I knew exactly what to do in concept, the two extra nested for loops threw me off. For a long time the image was being enlarged correctly but the physical dimensions of the image weren't increasing. Once I had that figured out the image was enlarging but not to the new dimensions. After some tracing and trial and error I managed to find the right formula to calculate the indices.
 
 
Here's the final, optimized enlarge method:
 
<span style='color:#7f0055; font-weight:bold; '>int</span> jdx = blockIdx.x * blockDim.x + threadIdx.x;
 
<span style='color:#7f0055; font-weight:bold; '>int</span> idx = blockIdx.y * blockDim.y + threadIdx.y;
 
<span style='color:#7f0055; font-weight:bold; '>int</span> k = idx + jdx * blockDim.x * gridDim.x;
 
<span style='color:#7f0055; font-weight:bold; '>int</span> enlargeRow, enlargeCol;
 
__shared__ <span style='color:#7f0055; font-weight:bold; '>int</span> pixel;
 
pixel = work[k];
 
enlargeRow = idx * factor;
 
enlargeCol = jdx * factor;
 
__syncthreads();
 
<span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> c = enlargeRow; c &lt; (enlargeRow + factor); c++)
 
{
 
<span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> d = enlargeCol; d &lt; (enlargeCol + factor); d++)
 
{
 
result[c + d * blockDim.x * gridDim.x * factor] = pixel;
 
__syncthreads();
 
}
 
}
 
 
I enjoyed parallelizing this program and really wish I could have figured out the CERN project. To make myself feel better I also parallelized the rotate image method.
While I did see a decrease in the time taken to run 50 loops, the decrease wasn't as significant as I had hoped. Obviously this kernel isn't optimized so I'm looking forward to some more impressive results as I update the code.
=== Assignment 3 ===I was going to paste the code snippet here but I'm getting frustrated with the formatting. Why is it so difficult to nicely format code on a Wiki? [http://pastebin.com/ZZV9KRJN Here] it is.
1
edit

Navigation menu