100
edits
Changes
→Assignment 3
[[File:DPS915 Team7 Optimized Trig Functions.PNG]]
The initial block size was 32 x 32 = 1024 which on devices with compute capability 3.0 is the ''maximum number of threads per block''. In an attempt to find an optimal block size, I tried the following block sizes and calculated their thread occupancy using the ''CUDA Occupancy Calculator'' spreadsheet:
*8 x 8: 50% thread occupancy
*16 x 16: 100% thread occupancy
*24 x 24: 84% thread occupancy
*32 x 32: 100% thread occupancy