Changes

Jump to: navigation, search

TudyBert

819 bytes added, 13:13, 19 April 2013
Assignment 3
=== Assignment 3 ===
After making sure memory access is coalesced and replacing the second counter loop with threads from a 2 dimensional block of 2 dimensional threads, I've achieved significant speed ups in the program.All it took was launching the kernel with an optimized 2D array of blocks each containing a 2D array of threads. For assignment 2 I had a grid with 1 thread for each column in the image. That meant each thread was running 3 nested for loops to do the necessary calculations for enlarging. Figuring out the math for calculating the correct index in the arrays proved to be tricky. Although I knew exactly what to do in concept, the two extra nested for loops threw me off. For a long time the image was being enlarged correctly but the physical dimensions of the image weren't increasing. Once I had that figured out the image was enlarging but not to the new dimensions. After some tracing and trial and error I managed to find the right formula to calculate the indices.   Here's the final, optimized enlarge method:
1
edit

Navigation menu