57
edits
Changes
→Assignment 2
=== Assignment 2 ===
Our initial idea was to use the neural network code for our assignment 2. But since the algorithm itself was not very accurate (2/10 correct predictions even after 10,000 training iterations), we decided to paralellize merge sort. Soon we realized that since its Big O classification was n log n, offloading computations to GPU would not be that effective. So, we settled with the cosine transform library, as described below.
====Cosine Tranformation (A Discrete Cosine Transform for Real Data)====
This [https://www.youtube.com/watch?v=tW3Hc0Wrgl0 Link] can be used for better understanding of the above formula.
----
Here is the [https://people.sc.fsu.edu/~jburkardt/cpp_src/cosine_transform/cosine_transform.html source code] used.
|}
As is evident, the algorithm is O(n2) currently. Using thread indices on the GPU to replace the for loops could potentially improve performance.
To increase the efficiency of the program we transformed the '''cosine_transform_data''' function into a kernel named '''cosTransformKernel''' which offloads the compute intense calculation of the program to the GPU.
ms.count() << " millisecs" << std::endl;
}
__global__ void cosTransformKernel(double *a, double *b, int n){
double angle;
[[File:kernel1.png]]
Even though the kernel includes a for-loop the execution time has decreased drastically. Thats because each thread is now responsible for one calculating one element of the final Cos transformed matrix(unit vector).
=== Assignment 3 ===