1
edit
Changes
TudyBert
,→Drive_God
Here's the final, optimized enlarge method:
<span style='color:#7f0055; font-weight:bold; '>int</span> jdx = blockIdx.x * blockDim.x + threadIdx.x;
<span style='color:#7f0055; font-weight:bold; '>int</span> idx = blockIdx.y * blockDim.y + threadIdx.y;
<span style='color:#7f0055; font-weight:bold; '>int</span> k = idx + jdx * blockDim.x * gridDim.x;
<span style='color:#7f0055; font-weight:bold; '>int</span> enlargeRow, enlargeCol;
__shared__ <span style='color:#7f0055; font-weight:bold; '>int</span> pixel;
pixel = work[k];
enlargeRow = idx * factor;
enlargeCol = jdx * factor;
__syncthreads();
<span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> c = enlargeRow; c < (enlargeRow + factor); c++)
{
<span style='color:#7f0055; font-weight:bold; '>for</span>(<span style='color:#7f0055; font-weight:bold; '>int</span> d = enlargeCol; d < (enlargeCol + factor); d++)
{
result[c + d * blockDim.x * gridDim.x * factor] = pixel;
__syncthreads();
}
}
I enjoyed parallelizing this program and really wish I could have figured out the CERN project. To make myself feel better I also parallelized the rotate image method.
Here's the final optimized code for rotating an image around its centre:
<div>
__global__ <span style='color:#7f0055; font-weight:bold; '>void</span> cudaRotateImage(<span style='color:#7f0055; font-weight:bold; '>int</span> *result, <span style='color:#7f0055; font-weight:bold; '>const</span> <span style='color:#7f0055; font-weight:bold; '>int</span> *work, <span style='color:#7f0055; font-weight:bold; '>int</span> ni, <span style='color:#7f0055; font-weight:bold; '>int</span> nj, <span style='color:#7f0055; font-weight:bold; '>float</span> rads)
{
<span style='color:#7f0055; font-weight:bold; '>int</span> r0, c0;
<span style='color:#7f0055; font-weight:bold; '>int</span> r1, c1;
<span style='color:#7f0055; font-weight:bold; '>int</span> jdx = blockIdx.x * blockDim.x + threadIdx.x;
<span style='color:#7f0055; font-weight:bold; '>int</span> idx = blockIdx.y * blockDim.y + threadIdx.y;
<span style='color:#7f0055; font-weight:bold; '>int</span> k = idx + jdx * blockDim.x * gridDim.x;
r0 = ni / 2;
c0 = nj / 2;
r1 = (<span style='color:#7f0055; font-weight:bold; '>int</span>) (r0 + ((idx - r0) * <span style='color:#7f0055; font-weight:bold; '>cos</span>(rads)) - ((jdx - c0) * <span style='color:#7f0055; font-weight:bold; '>sin</span>(rads)));
c1 = (<span style='color:#7f0055; font-weight:bold; '>int</span>) (c0 + ((idx - r0) * <span style='color:#7f0055; font-weight:bold; '>sin</span>(rads)) + ((jdx - c0) * <span style='color:#7f0055; font-weight:bold; '>cos</span>(rads)));
<span style='color:#7f0055; font-weight:bold; '>if</span>(!(r1 >= ni || r1 < 0 || c1 >=nj || c1 < 0))
{
result[c1 * nj + r1] = work[k];
}
}
</div>