Difference between revisions of "GPU610/SSD"

From CDOT Wiki
Jump to: navigation, search
(Assignment 1)
(Project Name TBA)
Line 1: Line 1:
 
{{GPU610/DPS915 Index | 20131}}
 
{{GPU610/DPS915 Index | 20131}}
= Project Name TBA =
+
= Drive_God_Lin =
 
+
We are going to be working on a [http://home.web.cern.ch/ CERN] project called Drive_God_Lin (more info soon...)
 
== Team Members ==  
 
== Team Members ==  
 
# [mailto:sganouts@myseneca.ca?subject=GPU610 Sezar Gantous]
 
# [mailto:sganouts@myseneca.ca?subject=GPU610 Sezar Gantous]
Line 12: Line 12:
 
=== Assignment 1 ===
 
=== Assignment 1 ===
 
Sezar:
 
Sezar:
 +
I decided to profile CERN project - Drive_God_Lin.
 +
After gprof the project with the test data provided I learned the following:
 +
 +
(summery of gprof)
 +
==
 +
Each sample counts as 0.01 seconds.
 +
  %  cumulative  self              self    total         
 +
time  seconds  seconds    calls  ms/call  ms/call  name 
 +
39.98    26.68    26.68      524    50.92    50.92  ordres_
 +
37.42    51.65    24.97  314400    0.08    0.08  zfunr_
 +
12.69    60.12    8.47  314400    0.03    0.03  cfft_
 +
...
 +
==
 +
As you can see, most the program's time is spent in the ordres() subroutine and zfunr() subroutine; both are localed in the fortran portion of the program (there is a c portion as well).
 +
 +
Furthermore, this program is already parallelized using openmp; which means it may be farther parallelized using cuda technology.
 +
In these two subroutines (specially in orders()) there are a lot of nested loops, if statements, goto's, and even some sore of a search algorithm (in orders() only) that I'm certain it could be notably improved using cuda technology.
 +
  
 
----
 
----

Revision as of 11:49, 9 February 2013


GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

Drive_God_Lin

We are going to be working on a CERN project called Drive_God_Lin (more info soon...)

Team Members

  1. Sezar Gantous
  2. Stephanie Bourque
  3. Dylan Segna
  4. Email All

Progress

Assignment 1

Sezar: I decided to profile CERN project - Drive_God_Lin. After gprof the project with the test data provided I learned the following:

(summery of gprof) == Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total          
time   seconds   seconds    calls  ms/call  ms/call  name   
39.98     26.68    26.68      524    50.92    50.92  ordres_
37.42     51.65    24.97   314400     0.08     0.08  zfunr_
12.69     60.12     8.47   314400     0.03     0.03  cfft_

... == As you can see, most the program's time is spent in the ordres() subroutine and zfunr() subroutine; both are localed in the fortran portion of the program (there is a c portion as well).

Furthermore, this program is already parallelized using openmp; which means it may be farther parallelized using cuda technology. In these two subroutines (specially in orders()) there are a lot of nested loops, if statements, goto's, and even some sore of a search algorithm (in orders() only) that I'm certain it could be notably improved using cuda technology.



Dylan:


Stephanie:

For assignment one I profiled a closest pair algorithm. The source code can be found here:

http://rosettacode.org/wiki/Closest-pair_problem/C

I was able to run the code successfully on Matrix and I believe it is a good candidate to parallelize. The closest() function generally eats up the most time and I believe that most of its remedial processes could be done by the GPU.

A useful reference site to help explain closest pair algorithms:

http://www.algorithmist.com/index.php/Closest_Pair


Assignment 2

Assignment 3