66
edits
Changes
Sirius
,no edit summary
=== Assignment 3 ===
We had realized that our implementation of a kernel had made some massive improvements, compared to the serial version, but after profiling the Assignment 2 version we had noticed that we could still make improvements. <br><br>Problem:
----
The kernels had been executing concurrently but the percentage of concurrency was quite low.
<br><br>
Solution:
----
Initiate thread count based on Compute Capability of the CUDA device.
<br><br>
The number of threads that were initialized per block had been calculated based on resident threads and blocks.
<br><br>
The number of blocks for the grid had been recalculated to incorporate the complexity of the image and the new threads per block.
<br><br>
=== Results ===
[[File:results.png]]