Here's the final, optimized enlarge method:
<span style='color:#7f0055; font-weight:bold; '>int</span> jdx = blockIdx.x * blockDim.x + threadIdx.x; <span style='color:#7f0055; font-weight:bold; '>int</span> idx = blockIdx.y * blockDim.y + threadIdx.y; <span style='color:#7f0055; font-weight:bold; '>int</span> k = idx + jdx * blockDim.x * gridDim.x; <span style='color:#7f0055; font-weight:bold; '>int</span> enlargeRow, enlargeCol; __shared__ <span style='color:#7f0055; font-weight:bold; '>int</span> pixel; pixel = work[k]; enlargeRow = idx * factor; enlargeCol = jdx * factor; __syncthreads(); <span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> c = enlargeRow; c < (enlargeRow + factor); c++) { <span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> d = enlargeCol; d < (enlargeCol + factor); d++) { result[c + d * blockDim.x * gridDim.x * factor] = pixel; __syncthreads(); } } I enjoyed parallelizing this program and really wish I could have figured out was going to paste the CERN project. To make myself feel better code snippet here but I also parallelized 'm getting frustrated with the rotate image methodformatting. Here's the final optimized Why is it so difficult to nicely format code for rotating an image around its centre: <div>__global__ <span style='color:#7f0055; font-weight:bold; '>void</span> cudaRotateImage(<span style='color:#7f0055; font-weight:bold; '>int</span> *result, <span style='color:#7f0055; font-weight:bold; '>const</span> <span style='color:#7f0055; font-weight:bold; '>int</span> *work, <span style='color:#7f0055; font-weight:bold; '>int</span> ni, <span style='color:#7f0055; font-weight:bold; '>int</span> nj, <span style='color:#7f0055; font-weight:bold; '>float</span> rads) { <span style='color:#7f0055; font-weight:bold; '>int</span> r0, c0; <span style='color:#7f0055; font-weighton a Wiki? [http:bold; '>int</span> r1, c1; <span style='color:#7f0055; font-weight:bold; '>int</span> jdx = blockIdx.x * blockDim.x + threadIdx.x; <span style='color:#7f0055; font-weight:bold; '>int</span> idx = blockIdx.y * blockDimpastebin.y + threadIdx.y; <span style='color:#7f0055; font-weight:bold; '>int<com/span> k = idx + jdx * blockDimZZV9KRJN Here] it is.x * gridDim.x; r0 = ni / 2; c0 = nj / 2; r1 = (<span style='color:#7f0055; font-weight:bold; '>int</span>) (r0 + ((idx - r0) * <span style='color:#7f0055; font-weight:bold; '>cos</span>(rads)) - ((jdx - c0) * <span style='color:#7f0055; font-weight:bold; '>sin</span>(rads))); c1 = (<span style='color:#7f0055; font-weight:bold; '>int</span>) (c0 + ((idx - r0) * <span style='color:#7f0055; font-weight:bold; '>sin</span>(rads)) + ((jdx - c0) * <span style='color:#7f0055; font-weight:bold; '>cos</span>(rads))); <span style='color:#7f0055; font-weight:bold; '>if</span>(!(r1 >= ni || r1 < 0 || c1 >=nj || c1 < 0)) { result[c1 * nj + r1] = work[k]; } }</div>