Changes

Jump to: navigation, search

Sirius

730 bytes added, 19:18, 7 April 2018
no edit summary
=== Assignment 3 ===
We had realized that our implementation of a kernel had made some massive improvements, compared to the serial version, but after profiling the Assignment 2 version we had noticed that we could still make improvements. <br><br>Problem:
----
The kernels had been executing concurrently but the percentage of concurrency was quite low.
<br><br>
Solution:
----
Initiate thread count based on Compute Capability of the CUDA device.
<br><br>
The number of threads that were initialized per block had been calculated based on resident threads and blocks.
<br><br>
The number of blocks for the grid had been recalculated to incorporate the complexity of the image and the new threads per block.
<br><br>
=== Results ===
[[File:results.png]]
66
edits

Navigation menu