212
edits
Changes
BetaT
,→New Kernel
it will range from 0 to NX like a for look. This matches the orignal Naiver output.
After this a separate kernel was created with the following code... int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; Next // The original code had the following statement:: u[m * nx + it] = un[m * nx + it - 1] - c*dt / dx*(un[m * nx + it - 1] - un[(m - 1) * nx + it - 1]); // Rather than having each thread perform this calculation which will be an additional 2 instructions per thread, i have just stored it in a variable float total = c*dt / dx; if (i < nx && j < nx) { // The original code as can be seen below is basically copying array un to array u. So i arranged the threads to do the same un[j * nx + i] = u[j * nx + i]; __syncthreads(); if (i != 0) { // This part was a bit trickier. As seen in the original code below array u would access all threads in the [0,0] [0,1] [0,2] etc... // And copy a value from array un's [1,1] [1,2] [1,3]..etc range. The trick here was the -1 difference at the end // Because in the original for look, (it) starts at the value 1, I have updatedadded and if condition to make sure the threads don't perform the operation on the thread of value 0. But it can still be access through the -1 operator. u[i] = un[1 * nx + i-1]; __syncthreads(); } } Compared to the original code... for (int it = 1; it <= nx - 1; it++) { for (int k = 0; k <= nx - 1; k++) { un[k * nx + it - 1] = u[k * nx + it - 1]; } for (int m = 1; m <= nx - 1; m++) { u[0 * nx + it] = un[1 * nx + it - 1]; u[m * nx + it] = un[m * nx + it - 1] - c*dt / dx*(un[m * nx + it - 1] - un[(m - 1) * nx + it - 1]); } } }