Changes

Jump to: navigation, search

GPU610/Team DAG

112 bytes removed, 12:25, 6 March 2013
Assignment 1
== Progress ==
=== Assignment 1 ===
Project selection discussed with Chris Szalwinski. Configuring local working environment and hardware for working with the CERN project source code. 
Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels.
(Each sample counts as 0.01 seconds.)
 
{| class="wikitable" border="1"
|}
Drive_God_lin.c - Contains OpenMP pragma for parallelization already. This should be converted over to a CUDA kernel to across Many -Cores instead of the avail CPU threads (or specified OPT_NUM_THREADS = 1). Because the test run used 1 optimal number of threads, the program executes serially. This produces a profile with the maximum percentages of time being used by 3 5 of the Fortran methods.
The first priority is to examine how the parallel pragma in the Drive_God_lin.c program divides up the task in to more CPU threads (forking), and if that process or smaller steps of that process can be re-written to be called by CUDA threads.
while (readDrivingTerms(drivingTermsFile, &turns, dataFilePath, sizeof(dataFilePath))) {
 
... /* loop containing code to parse datafile terms from the DrivingTermsFilePath. */
 
/* includes File IO
#pragma omp parallel for private(i, horizontalBpmCounter, verticalBpmCounter, kk, maxamp, calculatednattunex, calculatednattuney)
 
for (i = pickstart; i < maxcounthv; ++i) {
 
...
 
// call to sussix4noise Fortran program code.
}
}
OpenMP provides three directives that are merely conveniences:
 
PARALLEL DO / parallel for
PARALLEL SECTIONS
An example using the PARALLEL DO / parallel for combined directive is shown below.
eg: #pragma omp parallel for \  shared(a,b,c,chunk) private(i) \  schedule(static,chunk)
#pragma omp parallel for shared(a,b,c,chunk) private(i) schedule(static,chunk)
for (i=0; i < n; i++)
 
c[i] = a[i] + b[i];
The private list for the variables, and no shared - identifies that each of the threads created for this parallel execution will have their own copy of each variable.
 
 
The important Loop prior to the Fortran Call is below:
for (kk = 0; kk < MAXTURNS; ++kk) {  doubleToSend[kk] = matrix[horizontalBpmCounter][kk];  doubleToSend[kk + MAXTURNS] = matrix[verticalBpmCounter][kk];  doubleToSend[kk + 2 * MAXTURNS] = 0.0;  doubleToSend[kk + 3 * MAXTURNS] = 0.0;   
/* This calls the external Fortran code (tbach) */
sussix4drivenoise_(&doubleToSend[0], &tune[0], &amplitude[0], &phase[0], &allfreqsx[0], &allampsx[0], &allfreqsy[0], &allampsy[0], sussixInputFilePath);
sussix4drivenoise_(&doubleToSend[0], &tune[0], &amplitude[0], &phase[0], &allfreqsx[0], &allampsx[0], &allfreqsy[0], &allampsy[0], sussixInputFilePath);
 
This sets the array of data 'doubleToSend[]' from the data buffers read from the data files previously.

Navigation menu