Changes

TudyBert

302 bytes added, 12:01, 19 April 2013

→‎Assignment 2

Here's the code for newly parallelized method:

int idx = blockIdx.x * blockDim.x + threadIdx.x; int enlargeRow, enlargeCol; __shared__ int pixel;

for(int j = 0; j < nj; j++) { pixel = work[idx * nj + j]; enlargeRow = idx * factor; enlargeCol = j * factor; for(int c = enlargeRow; c < (enlargeRow + factor); c++) { for(int d = enlargeCol; d < (enlargeCol + factor); d++) { result[d + c * blockDim.x * gridDim.x * factor] = pixel; } } } While I did see a decrease in the time taken to run 50 loops, the decrease wasn't as significant as I had hoped. Obviously this kernel isn't optimized so I'm looking forward to some more impressive results as I update the code.

=== Assignment 3 ===

Rwstanica

1

edit

Changes

TudyBert

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools