Difference between revisions of "GPU610/SSD"
(→Assignment 1) |
(→Assignment 1) |
||
Line 13: | Line 13: | ||
Sezar: | Sezar: | ||
− | I decided to profile CERN project - Drive_God_Lin. | + | I decided to profile [http://home.web.cern.ch/ CERN] project - Drive_God_Lin. |
After gprof the project with the test data provided I learned the following: | After gprof the project with the test data provided I learned the following: | ||
(summery of gprof) | (summery of gprof) | ||
− | + | ||
− | Each sample counts as 0.01 seconds. | + | |
+ | Each sample counts as 0.01 seconds. | ||
% cumulative self self total | % cumulative self self total | ||
time seconds seconds calls ms/call ms/call name | time seconds seconds calls ms/call ms/call name | ||
Line 24: | Line 25: | ||
37.42 51.65 24.97 314400 0.08 0.08 zfunr_ | 37.42 51.65 24.97 314400 0.08 0.08 zfunr_ | ||
12.69 60.12 8.47 314400 0.03 0.03 cfft_ | 12.69 60.12 8.47 314400 0.03 0.03 cfft_ | ||
− | ... | + | ... |
− | + | ||
+ | |||
As you can see, most the program's time is spent in the ordres() subroutine and zfunr() subroutine; both are localed in the fortran portion of the program (there is a c portion as well). | As you can see, most the program's time is spent in the ordres() subroutine and zfunr() subroutine; both are localed in the fortran portion of the program (there is a c portion as well). | ||
Furthermore, this program is already parallelized using openmp; which means it may be farther parallelized using cuda technology. | Furthermore, this program is already parallelized using openmp; which means it may be farther parallelized using cuda technology. | ||
+ | |||
In these two subroutines (specially in orders()) there are a lot of nested loops, if statements, goto's, and even some sore of a search algorithm (in orders() only) that I'm certain it could be notably improved using cuda technology. | In these two subroutines (specially in orders()) there are a lot of nested loops, if statements, goto's, and even some sore of a search algorithm (in orders() only) that I'm certain it could be notably improved using cuda technology. | ||
Revision as of 11:52, 9 February 2013
GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary
Contents
Drive_God_Lin
We are going to be working on a CERN project called Drive_God_Lin (more info soon...)
Team Members
Progress
Assignment 1
Sezar:
I decided to profile CERN project - Drive_God_Lin. After gprof the project with the test data provided I learned the following:
(summery of gprof)
Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 39.98 26.68 26.68 524 50.92 50.92 ordres_ 37.42 51.65 24.97 314400 0.08 0.08 zfunr_ 12.69 60.12 8.47 314400 0.03 0.03 cfft_ ...
As you can see, most the program's time is spent in the ordres() subroutine and zfunr() subroutine; both are localed in the fortran portion of the program (there is a c portion as well).
Furthermore, this program is already parallelized using openmp; which means it may be farther parallelized using cuda technology.
In these two subroutines (specially in orders()) there are a lot of nested loops, if statements, goto's, and even some sore of a search algorithm (in orders() only) that I'm certain it could be notably improved using cuda technology.
Dylan:
Stephanie:
For assignment one I profiled a closest pair algorithm. The source code can be found here:
http://rosettacode.org/wiki/Closest-pair_problem/C
I was able to run the code successfully on Matrix and I believe it is a good candidate to parallelize. The closest() function generally eats up the most time and I believe that most of its remedial processes could be done by the GPU.
A useful reference site to help explain closest pair algorithms:
http://www.algorithmist.com/index.php/Closest_Pair