68
edits
Changes
→Assignment 3
==== Switching to shared memory ====
VISUAL PROFILER suggested few ideas for optimization:
- Concurrent Kernel Execution
- Low Memcpy/Compute Overlap
Concurrent Kernel Execution can let CUDA programmers launch several kernels asynchronously. Unfortunately
''' Source Code '''