Open main menu

CDOT Wiki β

Changes

Sudo

666 bytes added, 18:38, 10 December 2015
Assignment 3
=== Assignment 3 ===
 
 
I improved on my previous application by cleaning some logic and adding optimization to it. I noticed that even on cards with a compute capability of 3.0 that it did not accept a grid with x dimensions larger than 65535, therefore I had to rewrite my code to adhere to the limitation. Within the kernel itself, there was opportunity to pre-fetch values into register memory, in order to reduce latency during operations on those values. There was no requirement for shared memory due to the fact that threads did not need to share any memory at all.
 
 
The following is the new kernel :
 
 
 
 
 
 
The following is run-time comparison between my old and my new kernel :