96
edits
Changes
Sirius
,→Assignment 3
The application has the opportunity to receive an incredible boost to performance with the addition of parallel programming as most of the computational time is made up of calculating the average of every pixel which can be calculated concurrently, while only requiring a single synchronization at the end before we display the image.
=== Algorithms (Joseph Pildush)===
<syntaxhighlight lang="cpp>
int iDevice; cudaDeviceProp prop; cudaGetDevice(&iDevice); cudaGetDeviceProperties(&prop, iDevice); int resident_threads = prop.maxThreadsPerMultiProcessor; int resident_blocks = 8; if (prop.major >= 3 && prop.major < 5) { resident_blocks = 16;
}
else if (prop.major >= 5 && prop.major <= 6) { resident_blocks = 32; } //determine threads/block dim3 blockDims(resident_threads/resident_blocks,1,1);
//Calculate grid size to cover the whole image dim3 gridDims(pixels/blockDims.x);
</syntaxhighlight>