1
edit
Changes
→Assignment 1
== Progress ==
=== Assignment 1 ===
Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels.
(Each sample counts as 0.01 seconds.)
{| class="wikitable" border="1"
|}
Drive_God_lin.c - Contains OpenMP pragma for parallelization already. This should be converted over to a CUDA kernel to across Many -Cores instead of the avail CPU threads (or specified OPT_NUM_THREADS = 1). Because the test run used 1 optimal number of threads, the program executes serially. This produces a profile with the maximum percentages of time being used by 3 5 of the Fortran methods.
The first priority is to examine how the parallel pragma in the Drive_God_lin.c program divides up the task in to more CPU threads (forking), and if that process or smaller steps of that process can be re-written to be called by CUDA threads.
while (readDrivingTerms(drivingTermsFile, &turns, dataFilePath, sizeof(dataFilePath))) {
... /* loop containing code to parse datafile terms from the DrivingTermsFilePath. */
/* includes File IO
#pragma omp parallel for private(i, horizontalBpmCounter, verticalBpmCounter, kk, maxamp, calculatednattunex, calculatednattuney)
for (i = pickstart; i < maxcounthv; ++i) {
...
// call to sussix4noise Fortran program code.
}
OpenMP provides three directives that are merely conveniences:
PARALLEL DO / parallel for
PARALLEL SECTIONS
An example using the PARALLEL DO / parallel for combined directive is shown below.
eg: #pragma omp parallel for \ shared(a,b,c,chunk) private(i) \ schedule(static,chunk)
#pragma omp parallel for shared(a,b,c,chunk) private(i) schedule(static,chunk)
for (i=0; i < n; i++)
c[i] = a[i] + b[i];
The private list for the variables, and no shared - identifies that each of the threads created for this parallel execution will have their own copy of each variable.
The important Loop prior to the Fortran Call is below:
for (kk = 0; kk < MAXTURNS; ++kk) { doubleToSend[kk] = matrix[horizontalBpmCounter][kk]; doubleToSend[kk + MAXTURNS] = matrix[verticalBpmCounter][kk]; doubleToSend[kk + 2 * MAXTURNS] = 0.0; doubleToSend[kk + 3 * MAXTURNS] = 0.0; }
/* This calls the external Fortran code (tbach) */
sussix4drivenoise_(&doubleToSend[0], &tune[0], &litude[0], &phase[0], &allfreqsx[0], &allampsx[0], &allfreqsy[0], &allampsy[0], sussixInputFilePath);
This sets the array of data 'doubleToSend[]' from the data buffers read from the data files previously.