Changes

BetaT

342 bytes removed, 23:54, 7 April 2017

→‎Parallelizing with 2 Kernels

I have removed the 2 inner for loops but kept the outer loop.

The program takes 2 arrays. Let us say the X's represent the arrays below

__global__ void Calculate (float* u, float* un,int nx, int c, float dx, float dt)

{

int j = blockIdx.x * blockDim.x + threadIdx.x;

int i = blockIdx.y * blockDim.y + threadIdx.y;

// removes from instructions because no need to do this NX amount of times

float total = c*dt / dx;

if (i < nx && j < nx)

{

for (int it = 1; it <= nx- 1; it++)

{

if (i != 0 || i < nx )

{

un[i * nx + it-1] = u[i * nx + it-1];

__syncthreads();

u[it] = un[1 * nx + it - 1];

__syncthreads();

u[i * nx + it ] = un[i * nx + it- 1] - c*dt / dx* (un[i * nx + it - 1] - un[(i - 1) * nx + it - 1]);

__syncthreads();

}

Array 1 Array 2

oxxxx oxxxx

2nd: Array 1 will set the values in its ~~first row~~ [0,1] (marked by 2) index to the values in Array 2's ~~1st column and 1st~~ [1,0] (marked by )index.

Array 1 Array 2

oxxxx oxxxx

3rd: ~~Finally~~ Next Array 1 will calculate its ~~2nd~~ next column (marked by the 3) by performing a calculation as shown above on ~~2 columns in~~ Array 2 ~~which will be represented by the Thread Identifier as the X dimension and the Y dimension being represented~~ 's first column (marked by ~~the for loop iterator~~3 ).

Array 1 Array 2

o3xxx ~~33xxx~~ 3xxxx

o3xxx ~~33xxx~~ ~~o3xxx 33xxx~~ ~~o3xxx 33xxx~~ ~~o3xxx 33xxx~~ ~~__global__ void Calculate (float* u, float* un,int nx, int c, float dx, float dt)~~ { ~~int j = blockIdx.x * blockDim.x + threadIdx.x;~~ ~~int i = blockIdx.y * blockDim.y + threadIdx.y;~~ ~~// removes from instructions because no need to do this NX amount of times~~ ~~float total = c*dt / dx;~~ ~~if (i < nx && j < nx)~~ { ~~for (int it = 1; it <= nx- 1; it++)~~ { ~~if (i != 0 || i < nx )~~ { ~~un[i * nx + it-1] = u[i * nx + it-1];~~ ~~__syncthreads();~~ ~~u[it] = un[1 * nx + it - 1];~~ ~~__syncthreads();~~ ~~u[i * nx + it ] = un[i * nx + it- 1] - c*dt / dx* (un[i * nx + it - 1] - un[(i - 1) * nx + it - 1]);~~ ~~__syncthreads();~~ } } }3xxxx

~~I will also parallelize the Initialize function below:~~o3xxx 3xxxx

o3xxx 3xxxx

~~void init(float* u, int nx)~~ { ~~/*Initialize all values to 0, switching from vector to~~ This process will loop until it has reached end of the array. ~~Vector defaults to 0, array defaults to null*/~~ ~~for (int i = 0; i < nx; i++)~~ { ~~for (int k = 0; k < nx; k++)~~ ~~u[i * nx + k] = 0;~~ } }

Jadach1

212

edits

Changes

BetaT

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools