Changes

GPU610/Team DAG

112 bytes removed, 12:25, 6 March 2013

→‎Assignment 1

== Progress ==

=== Assignment 1 ===

Project selection discussed with Chris Szalwinski. Configuring local working environment and hardware for working with the CERN project source code.

Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels.

(Each sample counts as 0.01 seconds.)

{| class="wikitable" border="1"

|}

Drive_God_lin.c - Contains OpenMP pragma for parallelization already. This should be converted over to a CUDA kernel to across Many -Cores instead of the avail CPU threads (or specified OPT_NUM_THREADS = 1). Because the test run used 1 optimal number of threads, the program executes serially. This produces a profile with the maximum percentages of time being used by 3 5 of the Fortran methods.

The first priority is to examine how the parallel pragma in the Drive_God_lin.c program divides up the task in to more CPU threads (forking), and if that process or smaller steps of that process can be re-written to be called by CUDA threads.

while (readDrivingTerms(drivingTermsFile, &turns, dataFilePath, sizeof(dataFilePath))) {

... /* loop containing code to parse datafile terms from the DrivingTermsFilePath. */

/* includes File IO

#pragma omp parallel for private(i, horizontalBpmCounter, verticalBpmCounter, kk, maxamp, calculatednattunex, calculatednattuney)

for (i = pickstart; i < maxcounthv; ++i) {

...

// call to sussix4noise Fortran program code.

}

OpenMP provides three directives that are merely conveniences:

PARALLEL DO / parallel for

PARALLEL SECTIONS

An example using the PARALLEL DO / parallel for combined directive is shown below.

eg: ~~#pragma omp parallel for \~~ ~~shared(a,b,c,chunk) private(i) \~~ ~~schedule(static,chunk)~~

#pragma omp parallel for shared(a,b,c,chunk) private(i) schedule(static,chunk)

for (i=0; i < n; i++)

c[i] = a[i] + b[i];

The private list for the variables, and no shared - identifies that each of the threads created for this parallel execution will have their own copy of each variable.

The important Loop prior to the Fortran Call is below:

for (kk = 0; kk < MAXTURNS; ++kk) { doubleToSend[kk] = matrix[horizontalBpmCounter][kk]; doubleToSend[kk + MAXTURNS] = matrix[verticalBpmCounter][kk]; doubleToSend[kk + 2 * MAXTURNS] = 0.0; doubleToSend[kk + 3 * MAXTURNS] = 0.0; }

/* This calls the external Fortran code (tbach) */

sussix4drivenoise_(&doubleToSend[0], &tune[0], &amplitude[0], &phase[0], &allfreqsx[0], &allampsx[0], &allfreqsy[0], &allampsy[0], sussixInputFilePath);

~~sussix4drivenoise_(&doubleToSend[0], &tune[0], &amplitude[0], &phase[0], &allfreqsx[0], &allampsx[0], &allfreqsy[0], &allampsy[0], sussixInputFilePath);~~

This sets the array of data 'doubleToSend[]' from the data buffers read from the data files previously.

Christopher Schreiber

1

edit

Changes

GPU610/Team DAG

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools