68
edits
Changes
→Comparison
=== Comparison ===
The table below and graph show the comparison of parallel & serial versions: it includes CPU execution time, GPU execution time with all the memory operations and kernel execution time (in seconds). Note that n is the exponent since Bitonic sort requires the input array size to be of 2^n. It appears that CPU runs faster when the size is below 2^18, when the size is 2^16 CPU runs about 72% faster than GPU. After that, the CUDA code starts to outperform the serial code. The difference between execution time becomes dramatic one 'n' (the exponent) reaches number over 26.
'''Data Table'''