68
edits
Changes
→Parallelized
<pre>
for (k = 2; k <= N; k = 2 * k)// Cannot be parallel!
{
{
}
==== Kernel ====
We can take the code executed in the innermost loop and put it into CUDA kernel. The kernel is launched 'n' times where 'n' is the the total number of elements to be sorted. We pass data allocated on the device memory as well as 'j' & 'k" indices which can be used to indicate the current position in the Sorting Network.
'''Source Code'''