Difference between revisions of "GPU610/Team DAG"
(→Assignment 1) |
|||
Line 9: | Line 9: | ||
Project selection discussed with Chris Szalwinski. Configuring local working environment and hardware for working with the CERN project source code. | Project selection discussed with Chris Szalwinski. Configuring local working environment and hardware for working with the CERN project source code. | ||
− | + | Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels. | |
+ | All 4 of these procedure calls are part of the Fortran library included for executing the analysis. There are some portions of the 'main' method in the Drive_God_lin.c code which include a parallel OpenMP pragma, and this could also be tuned for some improvement for initialization of the data arrays, but may not provide improvement for the reading of the data file. | ||
+ | Methods most likely to offer parallel improvements via CUDA kernels (Top 5 based on Flat Profile). | ||
+ | |||
+ | |||
+ | |||
+ | Each sample counts as 0.01 seconds. | ||
+ | |||
+ | % cumulative self self total | ||
+ | |||
+ | time seconds seconds calls ms/call ms/call name | ||
+ | |||
+ | |||
+ | |||
+ | 47.66 9.99 9.99 314400 0.03 0.03 zfunr_ | ||
+ | |||
+ | 30.45 16.37 6.38 524 12.18 12.18 ordres_ | ||
+ | |||
+ | 10.84 18.64 2.27 314400 0.01 0.01 cfft_ | ||
+ | |||
+ | 3.53 19.38 0.74 314400 0.00 0.04 tunelasr_ | ||
+ | |||
+ | 3.34 20.08 0.70 1048 0.67 13.42 spectrum_ | ||
+ | |||
=== Assignment 2 === | === Assignment 2 === | ||
=== Assignment 3 === | === Assignment 3 === |
Revision as of 12:11, 4 March 2013
GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary
Contents
Team DAG
Team Members
- Chris Schreiber, Team Lead
Progress
Assignment 1
Project selection discussed with Chris Szalwinski. Configuring local working environment and hardware for working with the CERN project source code.
Profile of the Drive_God_lin program utilizing only 1 core/thread on the CPU (forcing serialized execution of all OpenMP Pragmas in the C+ and Fortran code) showed 4 primary targets to rewrite using CUDA kernels. All 4 of these procedure calls are part of the Fortran library included for executing the analysis. There are some portions of the 'main' method in the Drive_God_lin.c code which include a parallel OpenMP pragma, and this could also be tuned for some improvement for initialization of the data arrays, but may not provide improvement for the reading of the data file. Methods most likely to offer parallel improvements via CUDA kernels (Top 5 based on Flat Profile).
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
47.66 9.99 9.99 314400 0.03 0.03 zfunr_
30.45 16.37 6.38 524 12.18 12.18 ordres_
10.84 18.64 2.27 314400 0.01 0.01 cfft_
3.53 19.38 0.74 314400 0.00 0.04 tunelasr_
3.34 20.08 0.70 1048 0.67 13.42 spectrum_