212
edits
Changes
BetaT
,→New Kernel
'''int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y;
// The original code had the following statement:: u[m * nx + it] = un[m * nx + it - 1] - c*dt / dx*(un[m * nx + it - 1] - un[(m - 1) * nx + it - 1]);
// Rather than having each thread perform this calculation which will be an additional 2 instructions per thread, i have just stored it in a variable
float total = c*dt / dx;
if (i < nx && j < nx)
{
un[j * nx + i] = u[j * nx + i];
__syncthreads();
if (i != 0)
{
__syncthreads();
}
}'''
Compared to the original code...