Changes

Jump to: navigation, search

Happy Valley

224 bytes removed, 13:54, 26 February 2018
LZSS
=== Assignment 1 ===
==== LZSS ====
[https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Storer%E2%80%93Szymanski LZSS] (Lempel–Ziv–Storer–Szymanski) is a compression algorithm that belongs to LZ77 family. It attempts to replace a string of symbols with a reference to a dictionary location of the same string.
0.07 27.69 0.02 1 0.02 0.02 init_tree</pre>
<pre>/////////////////// 125MB 250MB FILE ///////////////////
[obelavina@localhost gpu610]$ cat 250M.flt
Flat profile:
|}
[[File:https://d2mxuefqeaa7sj.cloudfront.net/s_04A25BBEA5534B427FFC3152A69B8313D1EACB7FDCEF46F064943F9A797A2EBF_1519616588207_imageLzss_perf.png]]
'''Analysis'''
<pre>gprof ./lzss | gprof2dot | dot | convert - out.png</pre>
 [[File:https://d2mxuefqeaa7sj.cloudfront.net/s_04A25BBEA5534B427FFC3152A69B8313D1EACB7FDCEF46F064943F9A797A2EBF_1519616497287_outLzss_calls.png]]
The call graph demonstrates that insert_node function takes the bulk of work (83.48%). delete_node takes only 10% even though it’s called nearly as many times as insert_node. It is hard to determine if LZSS is a good candidate for GPU optimisation: data throughput can be quite large but the logic of the algorithm includes some branching which can introduce some issues while porting code to CUDA. I suspect that compress function itself will need to be parallized (rather than insert_node).
68
edits

Navigation menu