Changes

Jump to: navigation, search

A-Team

1,627 bytes added, 00:50, 8 April 2019
Final Profile
}
===Dynamic Parallelism===
Dynamic Parallelism in CUDA allows for the support of kernels to create and synchronize new nested kernels. Additionally, for our use case it also allows us to spend more time on the device to process information quickly without constant cudaMemcpy() or cudaMalloc() calls.
===Later===
{| class="wikitable mw-collapsible mw-collapsed"
! Train_kernelParent call Child kernel( ... )|-|<syntaxhighlight lang="cpp">__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) { int BATCH_SIZE = 256; float lr = 0.01 / BATCH_SIZE; //backpropagation d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10); kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2); cudaDeviceSynchronize();} __global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) { int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; //matrix multiplication if (i < ni && j < nj) { float sum = 0.0f; for (int k = 0; k < nk; k++) sum += d_a[i * nk + k] * d_b[k * nj + j]; d_p[i * nj + j] = sum; }}</syntaxhighlight>|} ===Final Iteration==={| class="wikitable mw-collapsible mw-collapsed"! GPU code
|-
|
</syntaxhighlight>
|}
===Final Profile===
This final profile is only of 20 iterations as we had errors occur beyond 20 iterations, likely due to naive coding and bad coding practice.
[[File:nnfinalprofile.jpg]]
===Compiling===
follow the article to set up visual studios for dynamic parallelismand recommended readings:   http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf   http://ramblingsofagamedevstudent.blogspot.com/2014/03/set-up-visual-studio-2012-for-cuda.html
=== Assignment 3 ===
====What we would do differently:====
There are many things, one of the major ones is to take on a more manageable task, one with proper documentation and reasoning behind chosen values.
113
edits

Navigation menu