70
edits
Changes
ntpb
As you can see, most of the time is spent in the 3rd and 4th blocks, which is where I will begin optimization.
Since the number of npoints is 800 in total, divided into separate CPU threads, we will never reach the maximum number of threads per block, 1024.