Difference between revisions of "Thunderbird"
(→Profiling: bubble vs quick algorithm) |
(→1. Optimize) |
||
(14 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
== Team Members == | == Team Members == | ||
# [mailto:ksanghun@myseneca.ca?subject=GPU610 Sanghun Kim] | # [mailto:ksanghun@myseneca.ca?subject=GPU610 Sanghun Kim] | ||
− | |||
# [mailto:wlee64@myseneca.ca?subject=GPU610 Wonho Lee] | # [mailto:wlee64@myseneca.ca?subject=GPU610 Wonho Lee] | ||
− | + | [mailto:ksanghun@myseneca.ca;wlee64@myseneca.ca?subject=GPU610 eMail All] | |
== Progress == | == Progress == | ||
=== Assignment 1 === | === Assignment 1 === | ||
+ | |||
==== Profiling: LZW algorithm ==== | ==== Profiling: LZW algorithm ==== | ||
It's a simple version of LZW compression algorithm with 12 bit codes. | It's a simple version of LZW compression algorithm with 12 bit codes. | ||
Line 105: | Line 105: | ||
0.00 4.08 0.00 1 0.00 0.00 ~_Hashtable() | 0.00 4.08 0.00 1 0.00 0.00 ~_Hashtable() | ||
+ | ==== Profiling: Ray-tracing algorithm ==== | ||
+ | Source Code: https://github.com/ksanghun/CUDA_raytrace/blob/master/GPUAssaginemt/cputest.cpp | ||
+ | |||
+ | [[File:Profiling_Raytrace.png]] | ||
+ | |||
+ | |||
+ | ==== '''Ray-Tracing Algorithm''' ==== | ||
+ | |||
+ | [[File:rt_1.png]] | ||
+ | |||
+ | ==== '''Ray-sphere Intersection''' ==== | ||
+ | |||
+ | [[File:rt_2.png]] | ||
+ | |||
+ | ==== '''Trace''' ==== | ||
+ | |||
+ | [[File:rt_3.png]] | ||
+ | |||
+ | ==== '''Floating-Point Considerations''' ==== | ||
+ | |||
+ | [[File:Raytrace_floatingerror.PNG ]] | ||
---- | ---- | ||
+ | === Assignment 2 === | ||
+ | ==== 1. Parallelize ==== | ||
+ | - render() | ||
+ | |||
+ | |||
+ | [[File:Render_CvsP2.png]] | ||
+ | |||
+ | |||
+ | - main() | ||
+ | |||
+ | |||
+ | [[File:main_CvsP2.png]] | ||
+ | |||
+ | ==== 2. Performance ==== | ||
+ | [[File:Data_CvsP.PNG]] | ||
+ | |||
+ | |||
+ | [[File:Graph_CvsP.PNG]] | ||
+ | |||
+ | ---- | ||
− | ==== | + | === Assignment 3 === |
− | + | ==== 1. Optimize ==== | |
− | + | - Global to constant memory | |
− | |||
− | |||
− | + | [[File:PvsO2.png]] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | ==== 2. Performance ==== | |
− | + | [[File:Data_PvsO.PNG]] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | [[File:Graph_PvsO.PNG]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | ==== 3. GPU Occupancy ==== | ||
+ | [[File:rt_5.png]] | ||
+ | ---- | ||
− | + | === Conclusion === | |
− | + | ==== 1. Output ==== | |
− | + | Video: https://youtu.be/3wV-ObHWZhg | |
− | + | ==== 2. Performance ==== | |
− | + | [[File:Graph_CvsPvsO.PNG]] | |
− | |||
− | |||
− | |||
− |
Latest revision as of 08:53, 10 April 2017
Contents
Thunderbird
Team Members
eMail All
Progress
Assignment 1
Profiling: LZW algorithm
It's a simple version of LZW compression algorithm with 12 bit codes.
void compress(string input, int size, string filename) { unordered_map<string, int> compress_dictionary(MAX_DEF); //Dictionary initializing with ASCII for ( int unsigned i = 0 ; i < 256 ; i++ ){ compress_dictionary[string(1,i)] = i; } string current_string; unsigned int code; unsigned int next_code = 256; //Output file for compressed data ofstream outputFile; outputFile.open(filename + ".lzw"); for(char& c: input){ current_string = current_string + c; if ( compress_dictionary.find(current_string) ==compress_dictionary.end() ){ if (next_code <= MAX_DEF) compress_dictionary.insert(make_pair(current_string, next_code++)); current_string.erase(current_string.size()-1); outputFile << convert_int_to_bin(compress_dictionary[current_string]); current_string = c; } } if (current_string.size()) outputFile << convert_int_to_bin(compress_dictionary[current_string]); outputFile.close(); }
Using compiler settings (gcc version 5.2.0):
g++ -c -O2 -g -pg -std=c++14 lzw.cpp
10 MB text file
wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 10.txt real 0m4.302s user 0m3.072s sys 0m0.632s
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 45.83 0.55 0.55 compress(string, int, string) 36.67 0.99 0.44 14983735 29.37 29.37 _M_find_before_node(unsigned int, string const&, unsigned int) const 7.50 1.08 0.09 10489603 8.58 8.58 show_usage() 5.83 1.15 0.07 4493878 15.58 44.94 operator[](string const&) 4.17 1.20 0.05 _Z22convert_char_to_stringB5cxx11PKci 0.00 1.20 0.00 4097 0.00 0.00 _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*) 0.00 1.20 0.00 3841 0.00 29.37 _ZNSt10_HashtableINSt7 0.00 1.20 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i 0.00 1.20 0.00 1 0.00 0.00 ~_Hashtable()
20 MB text file
wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 20.txt real 0m8.924s user 0m6.504s sys 0m2.008s
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 49.33 1.47 1.47 compress(string, int, string) 34.56 2.50 1.03 29962271 34.38 34.38 _M_find_before_node(unsigned int, string const&, unsigned int) const 7.05 2.71 0.21 8986654 23.37 57.74 operator[](string const&) 6.71 2.91 0.20 20975363 9.53 9.53 show_usage() 2.35 2.98 0.07 _Z22convert_char_to_stringB5cxx11PKci 0.00 2.98 0.00 4097 0.00 0.00 _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*) 0.00 2.98 0.00 3841 0.00 34.38 _ZNSt10_HashtableINSt7 0.00 2.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i 0.00 2.98 0.00 1 0.00 0.00 ~_Hashtable()
30 MB text file
wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 30.txt real 0m13.637s user 0m9.665s sys 0m2.984s
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 45.59 1.86 1.86 compress(string, int, string) 37.25 3.38 1.52 44940806 33.82 33.82 _M_find_before_node(unsigned int, string const&, unsigned int) const 7.60 3.69 0.31 13479429 23.00 56.82 operator[](string const&) 6.62 3.96 0.27 31461123 8.58 8.58 show_usage() 2.94 4.08 0.12 _Z22convert_char_to_stringB5cxx11PKci 0.00 4.08 0.00 4097 0.00 0.00 _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*) 0.00 4.08 0.00 3841 0.00 33.82 _ZNSt10_HashtableINSt7 0.00 4.08 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i 0.00 4.08 0.00 1 0.00 0.00 ~_Hashtable()
Profiling: Ray-tracing algorithm
Source Code: https://github.com/ksanghun/CUDA_raytrace/blob/master/GPUAssaginemt/cputest.cpp
Ray-Tracing Algorithm
Ray-sphere Intersection
Trace
Floating-Point Considerations
Assignment 2
1. Parallelize
- render()
- main()
2. Performance
Assignment 3
1. Optimize
- Global to constant memory
2. Performance
3. GPU Occupancy
Conclusion
1. Output
Video: https://youtu.be/3wV-ObHWZhg