Changes

Jump to: navigation, search

TudyBert

2,199 bytes added, 14:29, 19 April 2013
Assignment 3
=== Assignment 3 ===
After making sure memory access is coalesced and replacing the second counter loop with threads from a 2 dimensional block of 2 dimensional threads, I've achieved significant speed ups in the program. All it took was launching the kernel with an optimized 2D array of blocks each containing a 2D array of threads. For assignment 2 I had a grid with 1 thread for each column in the image. That meant each thread was running 3 nested for loops to do the necessary calculations for enlarging. Figuring out the math for calculating the correct index in the arrays proved to be tricky. Although I knew exactly what to do in concept, the two extra nested for loops threw me off. For a long time the image was being enlarged correctly but the physical dimensions of the image weren't increasing. Once I had that figured out the image was enlarging but not to the new dimensions. After some tracing and trial and error I managed to find the right formula to calculate the indices.   Here's the final, optimized enlarge method: <span style='color:#7f0055; font-weight:bold; '>int</span> jdx = blockIdx.x * blockDim.x + threadIdx.x;  <span style='color:#7f0055; font-weight:bold; '>int</span> idx = blockIdx.y * blockDim.y + threadIdx.y;  <span style='color:#7f0055; font-weight:bold; '>int</span> k = idx + jdx * blockDim.x * gridDim.x;  <span style='color:#7f0055; font-weight:bold; '>int</span> enlargeRow, enlargeCol;  __shared__ <span style='color:#7f0055; font-weight:bold; '>int</span> pixel;  pixel = work[k];  enlargeRow = idx * factor;  enlargeCol = jdx * factor;  __syncthreads();  <span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> c = enlargeRow; c &lt; (enlargeRow + factor); c++)  {  <span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> d = enlargeCol; d &lt; (enlargeCol + factor); d++)  {  result[c + d * blockDim.x * gridDim.x * factor] = pixel;   __syncthreads();  }  }  I enjoyed parallelizing this program and really wish I could have figured out the CERN project. To make myself feel better I also parallelized the rotate image method.    I was going to paste the code snippet here but I'm getting frustrated with the formatting. Why is it so difficult to nicely format code on a Wiki? [http://pastebin.com/ZZV9KRJN Here] it is.
1
edit

Navigation menu