IgorKrasnyanskiy
Timing for kernel with shared memory(total_iters = 4, 16384 by 32 dimension sizes)
12:28
+82