57
edits
Changes
→Assignment 3
Here is a comparison between the naive and optimized kernel
[[File:Examplekernel2.jpg]]
Evidently, there is some performance boost for the new version. However, each call to atomicAdd by a thread locks the global memory until the old value is read and added to the passed value. This deters faster execution as might be expected.