Changes

Jump to: navigation, search

Skynet/GPU610

3 bytes added, 16:10, 3 December 2014
Optimizations Used
'''__device__ __forceinline__''' : because the program uses various loops and recursion we force the compiler to use inline functions to speed up the trace and mix functions as well as some methods in the Vec3 and Sphere class.
 
'''sqrtf, tanf, fmaxf''' : where std:: was being used we replaced it with CUDA's math library equivalents although gains were marginal from this.
 
'''shared memory''' : we implemented shared memory but quickly realized that it was actually slower then sticking to global memory, we believe this has to do with the number of times the array has to be copied into shared memory.
 
**We also needed to rework a few parts of code in order to be parallelized

Navigation menu