70
edits
Changes
→Analysis
S1000 = 1/(1-.9857 + .9857/1000) = 65.00
In fact, the speed will decrease from 2.75 seconds to 0.0450 seconds.
As each iteration depends on the product of the previous iteration, there is a dependency resolution that might hamper the parallel process.
Consideration may also be extended to resolving ghost cells across different SMX while using the device global memory as the transfer pipeline.
==== Robert ====