1
edit
Changes
→Assignment 1
Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels.
All 4 of these procedure calls are part of the Fortran library included for executing the analysis. There are some portions of the 'main' method in the Drive_God_lin.c code which include a parallel OpenMP pragma, and this could also be tuned for some improvement for initialization of the data arrays, but may not provide improvement for the reading of the data file.
(Each sample counts as 0.01 seconds.)
| 30.45 || 16.73 || 6.38 || 524 || 12.18 || 12.18 || ordres_
|-
| 103.84 1853 || 19.64 238 || 0.27 74 || 314400 || 0.01 00 || 0.01 cfft_04 || tunelasr_
=== Assignment 2 ===
=== Assignment 3 ===