113
edits
Changes
A-Team
,→Initial implementation
During assignment 2, we tried a simple kernel that took the shape of a dot product, what this achieved was nothing special, actually as predicted at the end of assignment 1, continuously calling cudaMalloc and cudaMemCpy had severe consequences on time.
====Initial implementation====
//version 1 dot product
__global__ void kdot(const float* d_a, const float* d_b, float* d_p, int ni, int nj, int nk) {
}
This is the final iteration, we will outline the take aways bellow.