Changes

Kernal Blas

463 bytes added, 22:18, 2 April 2018

→‎Assignment 2

In order to parallelize the code from above, we decided to use a kernel to handle the calculations.

The logic largely remains the same , but we offload the ~~results~~ CPU calculations to the GPU. This code generates random points within the kernel and the calculations are ~~much faster~~also done in here.

Offloading to the GPU results in a pi calculation time to be reduced

The CPU's results drastically change as we increase the iteration 10x.

However, the parallelized results seem to stay accurate throughout the iterations.

It seems as though the calculation time doesn't change much and stays consistent. Profiling the code shows that '''memcpy''' takes up most of the time spent. Even when there are 10 iterations, the time remains at 300 milliseconds. As the iteration passes ~~100~~ 25 million, we have a bit of memory ~~leaks~~ leak which results in inaccurate results. In order to optimize the code, we must find a way reduce the time memcpy takes.

=== Assignment 3 ===

Jpham14

96

edits

Changes

Kernal Blas

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools