Difference between revisions of "Thunderbird"

From CDOT Wiki
Jump to: navigation, search
(Thunderbird)
(1. Optimize)
 
(12 intermediate revisions by 2 users not shown)
Line 2: Line 2:
 
== Team Members ==
 
== Team Members ==
 
# [mailto:ksanghun@myseneca.ca?subject=GPU610 Sanghun Kim]
 
# [mailto:ksanghun@myseneca.ca?subject=GPU610 Sanghun Kim]
# [mailto:hpark77@myseneca.ca?subject=GPU610 Heetai Park]
 
 
# [mailto:wlee64@myseneca.ca?subject=GPU610 Wonho Lee]
 
# [mailto:wlee64@myseneca.ca?subject=GPU610 Wonho Lee]
  [mailto:ksanghun@myseneca.ca;hpark77@myseneca.ca;wlee64@myseneca.ca?subject=GPU610 eMail All]
+
  [mailto:ksanghun@myseneca.ca;wlee64@myseneca.ca?subject=GPU610 eMail All]
  
 
== Progress ==
 
== Progress ==
 
=== Assignment 1 ===
 
=== Assignment 1 ===
 
==== Profiling: LZW algorithm ====
 
==== Profiling: Ray-Tracing algorithm ====
 
==== Analysis ====
 
===== Ray-Tracing Algorithm =====
 
----
 
=== Assignment 2 ===
 
==== Parallelize ====
 
==== Perfomance ====
 
----
 
 
=== Assignment 3 ===
 
==== Optimization ====
 
global vs constant
 
==== Perfomance ====
 
 
=== Conclusion ===
 
 
 
  
 
==== Profiling: LZW algorithm ====
 
==== Profiling: LZW algorithm ====
Line 125: Line 105:
 
     0.00      4.08    0.00        1    0.00    0.00  ~_Hashtable()
 
     0.00      4.08    0.00        1    0.00    0.00  ~_Hashtable()
  
 +
==== Profiling: Ray-tracing algorithm ====
 +
Source Code: https://github.com/ksanghun/CUDA_raytrace/blob/master/GPUAssaginemt/cputest.cpp
 +
 +
[[File:Profiling_Raytrace.png]]
 +
 +
 +
==== '''Ray-Tracing Algorithm''' ====
 +
 +
[[File:rt_1.png]]
 +
 +
==== '''Ray-sphere Intersection''' ====
 +
 +
[[File:rt_2.png]]
 +
 +
==== '''Trace''' ====
 +
 +
[[File:rt_3.png]]
 +
 +
==== '''Floating-Point Considerations''' ====
 +
 +
[[File:Raytrace_floatingerror.PNG ‎]]
 
----
 
----
 +
 +
=== Assignment 2 ===
 +
==== 1. Parallelize ====
 +
- render()
 +
 +
 +
[[File:Render_CvsP2.png]]
  
  
==== Profiling: Ray-tracing algorithm ====
+
- main()
Source Code: https://www.scratchapixel.com/lessons/3d-basic-rendering/introduction-to-ray-tracing/ray-tracing-practical-example
+
 
[[File:Profile.JPG]]
+
 
 +
[[File:main_CvsP2.png]]
 +
 
 +
==== 2. Performance ====
 +
[[File:Data_CvsP.PNG]]
 +
 
 +
 
 +
[[File:Graph_CvsP.PNG]]
 +
 
 +
----
 +
 
 +
=== Assignment 3 ===
 +
==== 1. Optimize ====
 +
- Global to constant memory
  
==== Profiling: bubble vs quick algorithm ====
 
It's a simple version of sorting algorithm - Source code
 
  
  void BubbleSort(int arr[], int size) {
+
[[File:PvsO2.png]]
        int tmp;                          /*used for swapping*/
 
        int i;
 
        int j;
 
        for (i = 0; i<size - 1; i++) {
 
                for (j = 0; j<size - 1 - i; j++) {
 
                        if (arr[j + 1] < arr[j]) {          /* compare the two neighbors */
 
                                tmp = arr[j];                  /* swap a[j] and a[j+1]      */
 
                                arr[j] = arr[j + 1];
 
                                arr[j + 1] = tmp;
 
                        }
 
                }
 
        }
 
  }
 
  void InsertionSort(int arr[], int left, int right) {
 
        int curr;
 
        int i, j;
 
        for (i = left + 1; i <= right; i++) {
 
                curr = arr[i];
 
                for (j = i; j>0 && arr[j - 1] > curr; j--) {
 
                        arr[j] = arr[j - 1];
 
                }
 
                arr[j] = curr;
 
        }
 
  }
 
  void QuickSort(int arr[], int left, int right) {
 
        if (right - left <= 3) {
 
                InsertionSort(arr, left, right);
 
        }
 
        else {
 
                int pivotpt = (left + right) / 2;  //index of the pivot
 
                int i = left;
 
                int j = right - 1;
 
                swap(arr[pivotpt], arr[right]);
 
                pivotpt = right;
 
                int pivot = arr[pivotpt];
 
                while (i<j) {
 
                        while (i< right - 1 && arr[i]<pivot) i++;
 
                        while (j > 0 && arr[j]>pivot) j--;
 
                        if (i<j)
 
                                swap(arr[i++], arr[j--]);
 
                }
 
                if (i == j && arr[i] < arr[pivotpt])
 
                        i++;
 
                swap(arr[i], arr[pivotpt]);
 
                QuickSort(arr, left, i - 1);
 
                QuickSort(arr, i + 1, right);
 
        }
 
  }
 
  void QuickSort(int arr[], int size) {
 
        QuickSort(arr, 0, size - 1);
 
  }
 
  
Using compiler settings (gcc version 5.2.0):
+
==== 2. Performance ====
  g++ -c -O2 -g -pg -std=c++14 a1.cpp
+
[[File:Data_PvsO.PNG]]
  
Sorting for 50000
 
Flat profile:
 
Each sample counts as 0.01 seconds.
 
  %  cumulative  self              self    total         
 
time  seconds  seconds    calls  ns/call  ns/call  name   
 
99.77      4.42    4.42                            BubbleSort(int*, int)
 
  0.23      4.43    0.01                            QuickSort(int*, int, int)
 
  0.00      4.43    0.00    16651    0.00    0.00  InsertionSort(int*, int, int)
 
  0.00      4.43    0.00        1    0.00    0.00  _GLOBAL__sub_I__Z10BubbleSortPii
 
  
 +
[[File:Graph_PvsO.PNG]]
  
Sorting for 100000
 
Flat profile:
 
Each sample counts as 0.01 seconds.
 
  %  cumulative  self              self    total         
 
time  seconds  seconds    calls  ns/call  ns/call  name   
 
99.89    17.84    17.84                            BubbleSort(int*, int)
 
  0.06    17.85    0.01    33355  299.81  299.81  InsertionSort(int*, int, int)
 
  0.06    17.86    0.01                            QuickSort(int*, int, int)
 
  0.00    17.86    0.00        1    0.00    0.00  _GLOBAL__sub_I__Z10BubbleSortPii
 
  
 +
==== 3. GPU Occupancy ====
 +
[[File:rt_5.png]]
 +
----
  
Sorting for 200000
+
=== Conclusion ===
Flat profile:
+
==== 1. Output ====
Each sample counts as 0.01 seconds.
+
Video: https://youtu.be/3wV-ObHWZhg
  %  cumulative  self              self    total         
+
==== 2. Performance ====
time  seconds  seconds    calls  ns/call  ns/call  name   
+
[[File:Graph_CvsPvsO.PNG]]
99.97    71.90    71.90                            BubbleSort(int*, int)
 
  0.01    71.91    0.01    66645  150.05  150.05  InsertionSort(int*, int, int)
 
  0.01    71.92    0.01                            QuickSort(int*, int, int)
 
  0.00    71.92    0.00        1    0.00    0.00  _GLOBAL__sub_I__Z10BubbleSortPii
 

Latest revision as of 08:53, 10 April 2017

Thunderbird

Team Members

  1. Sanghun Kim
  2. Wonho Lee
eMail All

Progress

Assignment 1

Profiling: LZW algorithm

It's a simple version of LZW compression algorithm with 12 bit codes.

 void compress(string input, int size, string filename) {
   unordered_map<string, int> compress_dictionary(MAX_DEF);
     //Dictionary initializing with ASCII
     for ( int unsigned i = 0 ; i < 256 ; i++ ){
       compress_dictionary[string(1,i)] = i;
     }
     string current_string;
     unsigned int code;
     unsigned int next_code = 256;
     //Output file for compressed data
     ofstream outputFile;
     outputFile.open(filename + ".lzw");
 
     for(char& c: input){
     current_string = current_string + c;
     if ( compress_dictionary.find(current_string) ==compress_dictionary.end() ){
             if (next_code <= MAX_DEF)
                 compress_dictionary.insert(make_pair(current_string, next_code++));
             current_string.erase(current_string.size()-1);
             outputFile << convert_int_to_bin(compress_dictionary[current_string]);
             current_string = c;
         }   
     }   
     if (current_string.size())
             outputFile << convert_int_to_bin(compress_dictionary[current_string]);
     outputFile.close();
 }

Using compiler settings (gcc version 5.2.0):

 g++ -c -O2 -g -pg -std=c++14 lzw.cpp

10 MB text file

 wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 10.txt
 real	0m4.302s
 user	0m3.072s
 sys	0m0.632s
 Flat profile:
 
 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  ns/call  ns/call  name    
  45.83      0.55     0.55                             compress(string, int, string) 
  36.67      0.99     0.44 14983735    29.37    29.37  _M_find_before_node(unsigned int, string const&, unsigned int) const 
   7.50      1.08     0.09 10489603     8.58     8.58  show_usage() 
   5.83      1.15     0.07  4493878    15.58    44.94  operator[](string const&)  
   4.17      1.20     0.05                             _Z22convert_char_to_stringB5cxx11PKci  
   0.00      1.20     0.00     4097     0.00     0.00  _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*)  
   0.00      1.20     0.00     3841     0.00    29.37  _ZNSt10_HashtableINSt7  
   0.00      1.20     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i
   0.00      1.20     0.00        1     0.00     0.00  ~_Hashtable()

20 MB text file

 wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 20.txt
 real	0m8.924s
 user	0m6.504s
 sys	0m2.008s
 Flat profile:
 
 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  ns/call  ns/call  name    
  49.33      1.47     1.47                             compress(string, int, string)
  34.56      2.50     1.03 29962271    34.38    34.38  _M_find_before_node(unsigned int, string const&, unsigned int) const
   7.05      2.71     0.21  8986654    23.37    57.74  operator[](string const&)
   6.71      2.91     0.20 20975363     9.53     9.53  show_usage()
   2.35      2.98     0.07                             _Z22convert_char_to_stringB5cxx11PKci
   0.00      2.98     0.00     4097     0.00     0.00  _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*)
   0.00      2.98     0.00     3841     0.00    34.38  _ZNSt10_HashtableINSt7
   0.00      2.98     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i
   0.00      2.98     0.00        1     0.00     0.00  ~_Hashtable()

30 MB text file

 wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 30.txt
 real	0m13.637s
 user	0m9.665s
 sys	0m2.984s
 Flat profile:
 
 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total           
  time   seconds   seconds    calls  ns/call  ns/call  name    
  45.59      1.86     1.86                             compress(string, int, string)
  37.25      3.38     1.52 44940806    33.82    33.82  _M_find_before_node(unsigned int, string const&, unsigned int) const
   7.60      3.69     0.31 13479429    23.00    56.82  operator[](string const&)
   6.62      3.96     0.27 31461123     8.58     8.58  show_usage()
   2.94      4.08     0.12                             _Z22convert_char_to_stringB5cxx11PKci
   0.00      4.08     0.00     4097     0.00     0.00  _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*)
   0.00      4.08     0.00     3841     0.00    33.82  _ZNSt10_HashtableINSt7
   0.00      4.08     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i
   0.00      4.08     0.00        1     0.00     0.00  ~_Hashtable()

Profiling: Ray-tracing algorithm

Source Code: https://github.com/ksanghun/CUDA_raytrace/blob/master/GPUAssaginemt/cputest.cpp

Profiling Raytrace.png


Ray-Tracing Algorithm

Rt 1.png

Ray-sphere Intersection

Rt 2.png

Trace

Rt 3.png

Floating-Point Considerations

Raytrace floatingerror.PNG


Assignment 2

1. Parallelize

- render()


Render CvsP2.png


- main()


Main CvsP2.png

2. Performance

Data CvsP.PNG


Graph CvsP.PNG


Assignment 3

1. Optimize

- Global to constant memory


PvsO2.png

2. Performance

Data PvsO.PNG


Graph PvsO.PNG


3. GPU Occupancy

Rt 5.png


Conclusion

1. Output

Video: https://youtu.be/3wV-ObHWZhg

2. Performance

Graph CvsPvsO.PNG