Thunderbird
Thunderbird
Team Members
eMail All
Progress
Assignment 1
Profiling: LZW algorithm
It's a simple version of LZW compression algorithm with 12 bit codes.
void compress(string input, int size, string filename) { unordered_map<string, int> compress_dictionary(MAX_DEF); //Dictionary initializing with ASCII for ( int unsigned i = 0 ; i < 256 ; i++ ){ compress_dictionary[string(1,i)] = i; } string current_string; unsigned int code; unsigned int next_code = 256; //Output file for compressed data ofstream outputFile; outputFile.open(filename + ".lzw"); for(char& c: input){ current_string = current_string + c; if ( compress_dictionary.find(current_string) ==compress_dictionary.end() ){ if (next_code <= MAX_DEF) compress_dictionary.insert(make_pair(current_string, next_code++)); current_string.erase(current_string.size()-1); outputFile << convert_int_to_bin(compress_dictionary[current_string]); current_string = c; } } if (current_string.size()) outputFile << convert_int_to_bin(compress_dictionary[current_string]); outputFile.close(); }
Using compiler settings (gcc version 5.2.0):
g++ -c -O2 -g -pg -std=c++14 lzw.cpp
10 MB text file
wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 10.txt real 0m4.302s user 0m3.072s sys 0m0.632s
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 45.83 0.55 0.55 compress(string, int, string) 36.67 0.99 0.44 14983735 29.37 29.37 _M_find_before_node(unsigned int, string const&, unsigned int) const 7.50 1.08 0.09 10489603 8.58 8.58 show_usage() 5.83 1.15 0.07 4493878 15.58 44.94 operator[](string const&) 4.17 1.20 0.05 _Z22convert_char_to_stringB5cxx11PKci 0.00 1.20 0.00 4097 0.00 0.00 _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*) 0.00 1.20 0.00 3841 0.00 29.37 _ZNSt10_HashtableINSt7 0.00 1.20 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i 0.00 1.20 0.00 1 0.00 0.00 ~_Hashtable()
20 MB text file
wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 20.txt real 0m8.924s user 0m6.504s sys 0m2.008s
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 49.33 1.47 1.47 compress(string, int, string) 34.56 2.50 1.03 29962271 34.38 34.38 _M_find_before_node(unsigned int, string const&, unsigned int) const 7.05 2.71 0.21 8986654 23.37 57.74 operator[](string const&) 6.71 2.91 0.20 20975363 9.53 9.53 show_usage() 2.35 2.98 0.07 _Z22convert_char_to_stringB5cxx11PKci 0.00 2.98 0.00 4097 0.00 0.00 _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*) 0.00 2.98 0.00 3841 0.00 34.38 _ZNSt10_HashtableINSt7 0.00 2.98 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i 0.00 2.98 0.00 1 0.00 0.00 ~_Hashtable()
30 MB text file
wlee64@matrix:~/gpu610/assignments/a1> time lzw -c 30.txt real 0m13.637s user 0m9.665s sys 0m2.984s
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 45.59 1.86 1.86 compress(string, int, string) 37.25 3.38 1.52 44940806 33.82 33.82 _M_find_before_node(unsigned int, string const&, unsigned int) const 7.60 3.69 0.31 13479429 23.00 56.82 operator[](string const&) 6.62 3.96 0.27 31461123 8.58 8.58 show_usage() 2.94 4.08 0.12 _Z22convert_char_to_stringB5cxx11PKci 0.00 4.08 0.00 4097 0.00 0.00 _M_insert_unique_node(unsigned int, unsigned int, std::__detail::_Hash_node<std::pair<string const, int>, true>*) 0.00 4.08 0.00 3841 0.00 33.82 _ZNSt10_HashtableINSt7 0.00 4.08 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z18convert_int_to_binB5cxx11i 0.00 4.08 0.00 1 0.00 0.00 ~_Hashtable()
Profiling: Ray-tracing algorithm
Source Code: https://github.com/ksanghun/CUDA_raytrace/blob/master/GPUAssaginemt/cputest.cpp
Ray-Tracing Algorithm
Ray-sphere Intersection
Trace
Floating-Point Considerations
Assignment 2
1. Parallelize
- render()
- main()
2. Performance
Assignment 3
1. Optimize
- Global to constant memory
2. Performance
3. GPU Occupancy
Conclusion
1. Output
Video: https://youtu.be/3wV-ObHWZhg