DPS915 C U D A B O Y S

From CDOT Wiki
Revision as of 14:46, 3 November 2015 by Johnathan Ragimov (talk | contribs) (Profiling)
Jump to: navigation, search

C U D A B O Y S

Team Members

  1. Manjot Sandhu, Some responsibility
  2. Johnathan Ragimov, Some other responsibility
  3. Oleg Eustace, Some other responsibility

Email All

Progress

Assignment 1

✓ Profile 0: File Encryption

Description

This piece of software takes a file and ecrypts it one of 4 ways:

  1. Byte Inversion
  2. Byte Cycle
  3. Xor Cipher
  4. RC4 Cipher


Inside the byteCipher method, exists a for loop that could use optimization:

      for (int i = 0; i < bufferSize; i++){
               // going over every byte in the file
               switch (mode) {
                       case 0: // inversion
                               buffer[i] = ~buffer[i];
                               break;
                       case 1: // cycle
                               buffer [i] = cycle (buffer [i]);
                               break;
                       case 2: // RC4
                               buffer [i] = buffer [i] ^ rc4_output();
                               break;
               }
       }

This for loop then can call one of two functions: cycle or rc4_output.

char cycle (char value) {
       int leftMask = 170;
       int rightMask = 85;
       int iLeft = value & leftMask;
       int iRight = value & rightMask;
       iLeft = iLeft >> 1;
       iRight = iRight << 1;
       return iLeft | iRight;
}
unsigned char rc4_output() {
   unsigned char temp;
   i = (i + 1) & 0xFF;
   j = (j + S[i]) & 0xFF;
   temp = S[i];
   S[i] = S[j];
   S[j] = temp;
   return S[(S[i] + S[j]) & 0xFF];
}


We need to change these two functions so they are added to the device as "device functions". We also need to convert this for loop into a kernel.

Profiling

RC4 Cipher - 283 MB mp3 File

[root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/music.mp3
/home/johny/aes/music.mp3
* * * File Protector * * *
Mode 4: RC4 cipher
Please enter the RC4 key (8 chars min)
testing123
The password is: testing123
Beginning encryption
Completed: 100%
Cipher completed.
Program terminated.
real    0m6.758s
user    0m3.551s
sys     0m0.068s
Flat profile:
Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
84.05      1.70     1.70 296271519     0.00     0.00  rc4_output()
13.39      1.97     0.27                             byteCipher(int, std::string)
 2.73      2.02     0.06        1    55.09    55.09  rc4_init(unsigned char*, unsigned int)
 0.00      2.02     0.00        1     0.00     0.00  _GLOBAL__sub_I_S

As we can see the rc4_output and byteCipher functions take up most of the processing time.


RC4 Cipher - 636 MB iso File

[root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/cent.iso
/home/johny/aes/cent.iso
 * * * File Protector * * *
Mode 4: RC4 cipher
Please enter the RC4 key (8 chars min)
testing123
The password is: testing123
Beginning encryption
Completed: 100%
Cipher completed.
Program terminated.
real    0m10.293s
user    0m8.235s
sys     0m0.312s
Flat profile:
Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
74.86      3.59     3.59 666894336     0.00     0.00  rc4_output()
23.21      4.70     1.11                             byteCipher(int, std::string)
 2.09      4.80     0.10        1   100.16   100.16  rc4_init(unsigned char*, unsigned int)
 0.00      4.80     0.00        1     0.00     0.00  _GLOBAL__sub_I_S

RC4 Cipher - 789 MB iso File

[root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/xu.iso
/home/johny/aes/xu.iso
 * * * File Protector * * *
Mode 4: RC4 cipher
Please enter the RC4 key (8 chars min)
testing123
The password is: testing123
Beginning encryption
Completed: 100%
Cipher completed.
Program terminated.
real    0m12.566s
user    0m10.170s
sys     0m0.228s
Flat profile:
Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
75.51      4.40     4.40 827326464     0.00     0.00  rc4_output()
23.02      5.74     1.34                             byteCipher(int, std::string)
 1.63      5.84     0.10        1    95.15    95.15  rc4_init(unsigned char*, unsigned int)
 0.00      5.84     0.00        1     0.00     0.00  _GLOBAL__sub_I_S

✗ Profile 1: PI Approximation

  • Sample run:
[root@jr-net-cent a1]# time ./pi
3.024
operation - took - 0.0001040000 secs
3.1676000000
operation - took - 0.0002280000 secs
3.1422800000
operation - took - 0.0022700000 secs
3.1418720000
operation - took - 0.0222910000 secs
3.1412748000
operation - took - 0.2185140000 secs
3.1417290800
operation - took - 2.2039310000 secs
3.1415420600
operation - took - 21.9592080000 secs
3.1415625844
operation - took - 47.1807910000 secs
3.1415537704
real    3m33.129s 
user    3m32.925s 
sys     0m0.016s
  • gprof:
Each sample counts as 0.01 seconds.
%   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
100.52    106.93   106.93       11     9.72     9.72  calcpi(int, int*)
0.00    106.93     0.00       11     0.00     0.00  reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000l> >)
0.00    106.93     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z10reportTimePKcNSt6chrono8durationIlSt5ratioILl1ELl1000000EEEE

✗ Profile 2: Wave Form Generator

This is the program we selected to optimize. It's a great candidate because it has 2 primary functions that have a few for loops in them. One of the functions reads an Mp3 file and writes wave data to a file -- this function takes quite a bit of time to execute. The other function actually takes this data and converts it to a view-able sound wave image. Both functions would benefit greatly from the extra processing power that a GPU provides: mp3 read/decode time would be greatly reduced.

'This piece of code is too complex and requires a linux environment to run. Please see Profile 0 for the one we are currently using.'

  • Sample Run
[root@jr-net-cent7 ~]# time audiowaveform -i Steph\ DJ\ -\ Noise\ Control\ Episode\ 025\ Feat\ Jack\ Diamond\ 13th\ January\ 2014.mp3 -o test.dat -z 256 -b 8
Input file: Steph DJ - Noise Control Episode 025 Feat Jack Diamond 13th January 2014.mp3
Format: Audio MPEG layer III stream
Bit rate: 320000 kbit/s
CRC: no
Mode: normal LR stereo
Emphasis: no
Sample rate: 44100 Hz
Generating waveform data...
Samples per pixel: 256
Input channels: 2
Done: 100%
Recoverable frame level error: lost synchronization
Frames decoded: 283540 (123:26.759)
Generated 1275930 points
Writing output file: test.dat
Resolution: 8 bits
real    0m32.486s
user    0m32.409s
sys     0m0.056s
[root@jr-net-cent7 ~]# which audiowaveform
/usr/local/bin/audiowaveform
[root@jr-net-cent7 ~]# gprof -p -b /usr/local/bin/audiowaveform > final.dat


  • gprof
Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
58.71      4.28     4.28    79746     0.00     0.00  WaveformGenerator::process(short const*, int)
31.00      6.54     2.26        1     2.26     7.27  Mp3AudioFileReader::run(AudioProcessor&)
 9.33      7.22     0.68 653276160     0.00     0.00  MadFixedToSshort(int)
 0.41      7.25     0.03        1     0.03     0.03  dumpInfo(std::ostream&, mad_header const&)
 0.14      7.26     0.01  7655603     0.00     0.00  short const& std::forward<short const&>(std::remove_reference<short const&>::type&)
 0.14      7.27     0.01  2551860     0.00     0.00  writeInt8(std::ostream&, signed char)
 0.14      7.28     0.01  2551860     0.00     0.00  std::vector<short, std::allocator<short> >::push_back(short const&)
 0.14      7.29     0.01  1275930     0.00     0.00  WaveformBuffer::getMinSample(int) const
 0.00      7.29     0.00  2551938     0.00     0.00  operator new(unsigned long, void*)
 0.00      7.29     0.00  2551860     0.00     0.00  void __gnu_cxx::new_allocator<short>::construct<short, short const&>(short*, short const&)
 0.00      7.29     0.00  2551860     0.00     0.00  std::vector<short, std::allocator<short> >::operator[](unsigned long) const
 0.00      7.29     0.00  2551860     0.00     0.00  std::enable_if<std::allocator_traits<std::allocator<short> >::__construct_helper<short, short const&>::value, void>::type std::allocator_traits<std::allocator<short> >::_S_construct<short, short const&>(std::allocator<short>&, short*, short const&)
 0.00      7.29     0.00  2551860     0.00     0.00  decltype (_S_construct({parm#1}, {parm#2}, (forward<short const&>)({parm#3}))) std::allocator_traits<std::allocator<short> >::construct<short, short const&>(std::allocator<short>&, short*, short const&)
 0.00      7.29     0.00  1275931     0.00     0.00  WaveformGenerator::reset()
 0.00      7.29     0.00  1275930     0.00     0.00  WaveformBuffer::appendSamples(short, short)
 0.00      7.29     0.00  1275930     0.00     0.00  WaveformBuffer::getMaxSample(int) const
 0.00      7.29     0.00    79747     0.00     0.00  AudioFileReader::showProgress(long long, long long)
 0.00      7.29     0.00     7272     0.00     0.00  BstdRead
 0.00      7.29     0.00     7271     0.00     0.00  BstdFileEofP
.....

✗Profile 3: String Processor

compiled by Oleg

  • Sample run:
ext_string_example
es + 123 = ext_string123
456 + es = 456ext_string
es * 3   = ext_stringext_stringext_string
3  * es  = ext_stringext_stringext_string
original:  abc1234?abc1234?abc1234
es - abc = 1234?1234?1234
es - 123 = abc?abc?abc
es -   ? = abc1234abc1234abc1234
ext_string == eXt_StRiNg
original:  eXt_StRiNg
lowercase: ext_string
uppercase: EXT_STRING
original:              [   ext_string   ]
remove leading space:  [ext_string   ]
remove trailing space: [ext_string]
es: abc, ijk, pqr, xyz ---> split: (abc) (ijk) (pqr) (xyz)
es: abc, ijk, pqr, xyz ---> split_n(3): (abc) (ijk) (pqr)
es: 1, -23, 456, -7890 ---> parse: (1) (-23) (456) (-7890)
es: 1.1, -23.32, 456.654, -7890.0987 ---> parsed: (1.1000000000000001) (-23.32) (456.654) (-7890.0986999999996)
non_repeated_char_example
No non-repeated chars in string.
First non repeated char: a
First non repeated char: b
No non-repeated chars in string.
First non repeated char: c
First non repeated char: 1
translation_table_example
Before: Such is this simple string sample....Wowzers!
After: S5ch 3s th3s s3mpl2 str3ng s1mpl2....W4wz2rs!
Before: Such is this simple string sample....Wowzers!
After: S5ch 3s th3s s3mpl2 str3ng s1mpl2....W4wz2rs!
Before: Such is this simple string sample....Wowzers!
After: S5ch 3s th3s s3mpl2 str3ng s1mpl2....W4wz2rs!
Before: Such is this simple string sample....Wowzers!
After: S5ch 3s th3s s3mpl2 str3ng s1mpl2....W4wz2rs!
Before: Such is this simple string sample....Wowzers!
After: S5ch 3s th3s s3mpl2 str3ng s1mpl2....W4wz2rs!
find_n_consecutive_example
Result-01: [1] Location: [0]] Length: [1]
Result-02: [22] Location: [2]] Length: [2]
Result-03: [333] Location: [5]] Length: [3]
Result-04: [4444] Location: [9]] Length: [4]
Result-05: [55555] Location: [14]] Length: [5]
Result-06: [666666] Location: [20]] Length: [6]
Result-07: [7777777] Location: [27]] Length: [7]
Result-08: [88888888] Location: [35]] Length: [8]
Result-09: [999999999] Location: [44]] Length: [9]
Result-01: [a] Location: [0]] Length: [1]
Result-02: [bB] Location: [2]] Length: [2]
Result-03: [cCc] Location: [5]] Length: [3]
Result-04: [dDdD] Location: [9]] Length: [4]
Result-05: [EeEeE] Location: [14]] Length: [5]
Result-06: [fFfFfF] Location: [20]] Length: [6]
Result-07: [gGgGgGg] Location: [27]] Length: [7]
Result-08: [HhHhHhHh] Location: [35]] Length: [8]
Result-09: [IiIiIiIiI] Location: [44]] Length: [9]
split_on_consecutive_example
1 Consecutive digits: 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 9 9 0 0 0 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4
2 Consecutive digits: 22 33 44 44 55 55 66 66 66 77 77 77 99 00 11 11 22 22 33 33 33 44 44 44
3 Consecutive digits: 333 444 555 666 666 777 777 000 111 222 333 333 444 444
4 Consecutive digits: 4444 5555 6666 7777 1111 2222 3333 4444
5 Consecutive digits: 55555 66666 77777 22222 33333 44444
6 Consecutive digits: 666666 777777 333333 444444
1 Consecutive letters: A B B C C C D D D D E E E E E F F F F F F G G G G G G G H I I J J J K K K K L L L L L M M M M M M N N N N N N N
2 Consecutive letters: BB CC DD DD EE EE FF FF FF GG GG GG II JJ KK KK LL LL MM MM MM NN NN NN
3 Consecutive letters: CCC DDD EEE FFF FFF GGG GGG JJJ KKK LLL MMM MMM NNN NNN
4 Consecutive letters: DDDD EEEE FFFF GGGG KKKK LLLL MMMM NNNN
5 Consecutive letters: EEEEE FFFFF GGGGG LLLLL MMMMM NNNNN
6 Consecutive letters: FFFFFF GGGGGG MMMMMM NNNNNN
index_of_example
Index of pattern[0123456789ABC]: 0
Index of pattern[123456789ABC]: 1
Index of pattern[23456789ABC]: 2
Index of pattern[3456789ABC]: 3
Index of pattern[456789ABC]: 4
Index of pattern[56789ABC]: 5
Index of pattern[6789ABC]: 6
Index of pattern[789ABC]: 7
Index of pattern[89ABC]: 8
Index of pattern[9ABC]: 9
Index of pattern[xyz]: 4294967295
truncatedint_example
i = -1234
i = -1234
u = 1234
i = -1234
u = 1234
real    0m0.248s
user    0m0.080s
sys     0m0.024s
  • Profile:


Flat profile:
Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total          
time   seconds   seconds    calls  ms/call  ms/call  name   
83.33      0.05     0.05  1000008     0.00     0.00  unsigned int boost::uniform_int<unsigned int>::generate<boost::random::detail::pass_through_engine<boost::random::detail::pass_through_engine<boost::random::mersenne_twister<unsigned int, 32, 624, 397, 31, 2567483615u, 11, 7, 2636928640u, 15, 4022730752u, 18, 3346425566u>&> > >(boost::random::detail::pass_through_engine<boost::random::detail::pass_through_engine<boost::random::mersenne_twister<unsigned int, 32, 624, 397, 31, 2567483615u, 11, 7, 2636928640u, 15, 4022730752u, 18, 3346425566u>&> >&, unsigned int, unsigned int, unsigned int)
16.67      0.06     0.01        1    10.00    60.00  strtk::generate_random_data(unsigned char*, unsigned int, unsigned int, unsigned int)
 0.00      0.06     0.00     1642     0.00     0.00  boost::random::mersenne_twister<unsigned int, 32, 624, 397, 31, 2567483615u, 11, 7, 2636928640u, 15, 4022730752u, 18, 3346425566u>::twist(int)
 0.00      0.06     0.00      979     0.00     0.00  strtk::text::is_digit(char)
 0.00      0.06     0.00      978     0.00     0.00  strtk::text::is_letter(char)

Assignment 2

Description

Removing the old CPU bottleneck:

      for (int i = 0; i < bufferSize; i++){
               // going over every byte in the file
               switch (mode) {
                       case 0: // inversion
                               buffer[i] = ~buffer[i];
                               break;
                       case 1: // cycle
                               buffer [i] = cycle (buffer [i]);
                               break;
                       case 2: // RC4
                               buffer [i] = buffer [i] ^ rc4_output();
                               break;
               }
       }

And replacing it with

...
if (mode == 0)
   getInversionBuffer << < dGrid, dBlock >> >(d_a, bufferSize, d_output);
if (mode == 1)
   getCycleBuffer << < dGrid, dBlock >> >(d_a, bufferSize, d_output);
if (mode == 2)
   getRC4Buffer << < dGrid, dBlock >> >(d_a, bufferSize, d_output);
...


Converting cycle and rc4_output functions to device functions:

/**
* Description:  Device function cycle
**/
__device__  char cycle(char value) {
  int leftMask = 170;
  int rightMask = 85;
  int iLeft = value & leftMask;
  int iRight = value & rightMask;
  iLeft = iLeft >> 1;
  iRight = iRight << 1;
  return iLeft | iRight;
}
/**
* Description:  Device function RC4
**/
__device__ unsigned char rc4_output() {
  unsigned char temp;
  unsigned char S[0x100]; // dec 256
  unsigned int i, j;
  i = (i + 1) & 0xFF;
  j = (j + S[i]) & 0xFF;
  temp = S[i];
  S[i] = S[j];
  S[j] = temp;
  return S[(S[i] + S[j]) & 0xFF];
}

Profiling

RC4 Cipher - 283 MB mp3 File

Total runtime: 1.358 seconds

Music.png


RC4 Cipher - 636 MB iso File

Total runtime: 3.87 seconds

Cent.png


RC4 Cipher - 789 MB iso File

Total runtime: 5.072 seconds

Xu.png

Assignment 3