96
edits
Changes
→Assignment 3
----
After realizing the cudaMemcpy was took quite a bit of time, we focused our efforts on optimizing it.
It was difficult to find a solution because the initial copy always takes a bit of time.<br>
We tried using cudaMallocHost to see if we can allocate memory instead of using malloc. <br>
cadaMallocHost will allocate pinned memory which is stored in RAM and can be accessed by the GPU's DMA directly.
We changed one part of our code
<syntaxhighlight lang="cpp">
</syntaxhighlight>
The error in PI estimation is how far it is from the known value of pi. PI = 3.1415926535
<br>
[[File:kernal-blas-optimized.png]]
[[File:Chartp3.PNG]]