1
edit
Changes
→Assignment 3
=== Assignment 3 ===
For assignment 3, we were checking everything we can do to improve the performance and found 2 thing we can do.
Firstly, in the while loop, there were 6 times of memory copy functions called, and we found that we can reduce 6 times to 1 time by using device address pointer switching.
Furthermore, we found that if the sample number n is less that 1024, we can use shared memory in the kernel.