Changes

Jump to: navigation, search

Happy Valley

331 bytes added, 08:33, 9 April 2018
Parallelized
<pre>
for (k = 2; k <= N; k = 2 * k)// Cannot be parallel!
{
// printf("k = %d \n", k); for (j = k >> 1; j > 0; j = j >> 1)// Cannot be parallel!
{
// printf(" - j = %d \n", j); for (i = 0; i<N; i++) {}// Can be parallel! }
}
==== Kernel ====
 
We can take the code executed in the innermost loop and put it into CUDA kernel. The kernel is launched 'n' times where 'n' is the the total number of elements to be sorted. We pass data allocated on the device memory as well as 'j' & 'k" indices which can be used to indicate the current position in the Sorting Network.
'''Source Code'''
68
edits

Navigation menu