212
edits
Changes
BetaT
,→New Kernel
// The original code had the following statement:: u[m * nx + it] = un[m * nx + it - 1] - c*dt / dx*(un[m * nx + it - 1] - un[(m - 1) * nx + it - 1]);
// Rather than having each thread perform this calculation which will be an additional 2 instructions per thread, i have just stored it in a variable
float total = c*dt / dx;
{
// The original code as can be seen below is basically copying array un to array u. So i arranged the threads to do the same
un[j * nx + i] = u[j * nx + i];
__syncthreads();
if (i != 0)
{
u[i] = un[1 * nx + i-1];
__syncthreads();