After programming these kernel. we noticed an improvement in performance.
==== Conclusion ====
By comparing the run-times of the serial KmeansPlusPlus and the parallelized version, we can see that the performance of the program has improved slightly.
This program The performance improvement is not significant for smaller clusters and iterations. But you can further be see that the performance has been improved by off-loading some more operations from for the CPU to the GPUhigher test cases. But this will require more time and research [[File:GraphAssignment2.png|550px]]
=== Assignment 3 ===