==Assignment2==
Let's see how this algorithm works to sort.
[[File:BitonicSort1.png]]
First, array is unsorted array.Second box is stage 1, and the orange boxes are sorted in ascending order and green boxes are sorted in descending order.Third box is stage 2 and it has 2 rounds. In first round, there are 2 orange and green boxes that contains 4 elements. In this case, index 0 and 2 and 1 and 3 will be compared and sorted in ascending if orange box and in descending if green box. Then in round 2, index 0 and 1 , 2 and 3 ... will be compared and sorted. The third and fourth boxes are same.
second box is stage 1, and orange boxes are sorted in ascending order and green boxes are sorted in descending order.
Third box is stage 2 and it has 2 rounds. In first round, there are 2 orange and green boxes that contains 4 elements. in this case, index 0 and 2 and 1 and 3 will be compared and sorted in ascending if orange box and in descending if green box. then in round 2, index 0 and 1 , 2 and 3 ... will be compared and sorted. Third box and Fourth box are same.
This is launch statement.
We use nested for loop. i is stage and it increases until log2(n). For example, let say n is 16, so i is 1 to 4. Another for loop represents round and round cannot be bigger than stage.
[[File:bitonicLaunch.png]]
This is launch statement. we use nested for loop. i is stage and it increases until log2(n). for example, let say n is 16, so i is 1 to 4. Another for loop represents round and round cannot be bigger than stage.
This is bitonic kernel. There are several variables.
[[File:BitonicKernel.jpg]]
This is bitonic kernel. There are several variables.
'''Gap''' expresses gap between 2 elements that will be compared. for For example, if stage is 2 and round is 1, gap will be 2. because index 0 will be compared with index 2 and index 1 will be compared with index 3. if stage is 2 and round is 2, gap will be 1 and index 0 will be compared with index 1 so on.
'''Group''' shows where each threads are involved in whether ascending or descending order. for example, thread 0 in stage 1 will be 0 because (0 / ( 1 << (1-1))) % 2 = 0 and thread 1 will be 1 because (1 / (1 << (1-1))) % 2 = 1. so in stage 1, thread will have 0 1 0 1 0 1 .... so each threads will sort in ascending and in descending in order. if stage is 2, thread 0 will be 0 and thread 1 will also be 0 because ( 1 / (1 << 1)) % 2 = 0. so each threads in stage 2 will have 0 0 1 1 0 0 1 1 ....
'''idx''' shows which elements in array will be picked by threads. for example, thread 0 in stage 1 will have (0 / 1) * 1 * 2 + 0 % 1 = 0, and thread 1 will have (1 / 1) * 1 * 2 + 1 % 1 = 2, and thread 2 will have (2 / 1) * 1 * 2 + 2 % 1 = 4 ... 6, 8, 10. because in stage 1, index 0 will be compared with 1, 2 compared with 3, 4 compared with 5 .... another example, if thread 0 in round 1 in stage 2 will have ( 0 / 2)*2*2 + 0%2 = 0, thread 1 will have ( 1/2)*2*2 + 1%2 = 1, thread 2 will have ( 2/2)*2*2 + 2%2 = 4, thread 3 will have (3/2) * 2 * 2 + 3%2 = 5. So threads will have 0, 1, 4, 5, 8, 9 .... because in round 1 in stage 2, index 0 will be compared with 2 and 1 compared with 3, 4 compared with 6 and 5 compared with 7.
[[File:BitonicGraph'''idx''' shows which elements in array will be picked by threads. For example, thread 0 in stage 1 will have (0 / 1) * 1 * 2 + 0 % 1 = 0, and thread 1 will have (1 / 1) * 1 * 2 + 1 % 1 = 2, and thread 2 will have (2 / 1) * 1 * 2 + 2 % 1 = 4 ... 6, 8, 10. because in stage 1, index 0 will be compared with 1, 2 compared with 3, 4 compared with 5 .... another example, if thread 0 in round 1 in stage 2 will have ( 0 / 2)*2*2 + 0%2 = 0, thread 1 will have ( 1/2)*2*2 + 1%2 = 1, thread 2 will have ( 2/2)*2*2 + 2%2 = 4, thread 3 will have (3/2) * 2 * 2 + 3%2 = 5. So threads will have 0, 1, 4, 5, 8, 9 .... because in round 1 in stage 2, index 0 will be compared with 2 and 1 compared with 3, 4 compared with 6 and 5 compared with 7. N is 65536, 2^16. The red box shows Bitonic using GPU, the blue box shows quick using CPU, green box shows bitonic using CPU.jpg]]
[[File:65536.png]]
N is 655361048576, 2^16. Red box shows Bitonic using GPU, blue box shows quick using CPU, green box shows bitonic using CPU20.
[[File:1048576.png]]
N is 16777216, 2^24
[[File:16777216.png]]
Below is the comparison between bitonic sort using GPU, quick sort using CPU, bitonic sort using CPU.
[[File:BitonicGraph.jpg]]