1
edit
Changes
→Assignment 3
== Progress ==
==Assignment 1 == === <span style="color: green">✓ Profile 0: File Encryption</span> ======= Description ====This piece of software takes a file and ecrypts it one of 4 ways:# Byte Inversion# Byte Cycle# Xor Cipher# RC4 Cipher Inside the byteCipher method, exists a for loop that could use optimization. Within this loop specifically, the lines that call the <code>cycle</code> and <code>rc4_output</code> functions are the ones that are taking the longest time to execute: for (int i = 0; i < bufferSize; i++){ // going over every byte in the file switch (mode) { case 0: // inversion buffer[i] = ~buffer[i]; break; case 1: // cycle buffer [i] = cycle (buffer [i]); break; case 2: // RC4 buffer [i] = buffer [i] ^ rc4_output(); break; } } Here is what these functions <code>cycle</code> and <code>rc4_output</code> functions look like: char cycle (char value) { int leftMask = 170; int rightMask = 85; int iLeft = value & leftMask; int iRight = value & rightMask; iLeft = Assignment iLeft >> 1; iRight = iRight << 1; return iLeft | iRight; } unsigned char rc4_output() { unsigned char temp; i = (i + 1) & 0xFF; j = (j + S[i]) & 0xFF; temp = S[i]; S[i] = S[j]; S[j] = temp; return S[(S[i] + S[j]) & 0xFF]; } We need to change these two functions so they are added to the CUDA device as "device functions". ==== Profiling on Linux ==== The following test runs were performed on the following Virtual Machine:* CentOS 7* i7-3820 @ 3.6 GHz* 2GB DDR3* gcc version 4.8.3 Using compiler settings: g++ -c -O2 -g -pg -std=c++11 encFile.cpp '''RC4 Cipher - 283 MB mp3 File''' [root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/music.mp3 /home/johny/aes/music.mp3 * * * File Protector * * * Mode 4: RC4 cipher Please enter the RC4 key (8 chars min) testing123 The password is: testing123 Beginning encryption Completed: 100% Cipher completed. Program terminated. real 0m6.758s user 0m3.551s sys 0m0.068s Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 84.05 1.70 1.70 296271519 0.00 0.00 rc4_output() 13.39 1.97 0.27 byteCipher(int, std::string) 2.73 2.02 0.06 1 55.09 55.09 rc4_init(unsigned char*, unsigned int) 0.00 2.02 0.00 1 0.00 0.00 _GLOBAL__sub_I_S As we can see the <code>rc4_output</code> and <code>byteCipher</code> functions take up most of the processing time. '''RC4 Cipher - 636 MB iso File''' [root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/cent.iso /home/johny/aes/cent.iso * * * File Protector * * * Mode 4: RC4 cipher Please enter the RC4 key (8 chars min) testing123 The password is: testing123 Beginning encryption Completed: 100% Cipher completed. Program terminated. real 0m10.293s user 0m8.235s sys 0m0.312s Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 74.86 3.59 3.59 666894336 0.00 0.00 rc4_output() 23.21 4.70 1.11 byteCipher(int, std::string) 2.09 4.80 0.10 1 100.16 100.16 rc4_init(unsigned char*, unsigned int) 0.00 4.80 0.00 1 0.00 0.00 _GLOBAL__sub_I_S '''RC4 Cipher - 789 MB iso File''' [root@jr-net-cent7 aes]# time ./encFile 4 /home/johny/aes/xu.iso /home/johny/aes/xu.iso * * * File Protector * * * Mode 4: RC4 cipher Please enter the RC4 key (8 chars min) testing123 The password is: testing123 Beginning encryption Completed: 100% Cipher completed. Program terminated. real 0m12.566s user 0m10.170s sys 0m0.228s Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 75.51 4.40 4.40 827326464 0.00 0.00 rc4_output() 23.02 5.74 1.34 byteCipher(int, std::string) 1.63 5.84 0.10 1 95.15 95.15 rc4_init(unsigned char*, unsigned int) 0.00 5.84 0.00 1 0.00 0.00 _GLOBAL__sub_I_S ==== Profiling on Windows ==== The following test runs were performed on the following Machine:* Windows 10* i7-4790k @ 4GHz* 16GB DDR3* Visual Studio 2013 '''RC4 Cipher - 283 MB mp3 File''' [[File:winmp3.png]] '''RC4 Cipher - 636 MB iso File''' [[File:wincent.png]] '''RC4 Cipher - 789 MB iso File''' [[File:winxu.png]] '''Byte Cycle - 283 MB mp3 File''' [[File:winmp32.png]] '''Byte Cycle - 636 MB iso File''' [[File:wincent2.png]] '''Byte Cycle - 789 MB iso File''' [[File:winxu2.png]] === <span style="color: red">✗ Profile 1 : PI Approximation</span> ===
* Sample run:
operation - took - 47.1807910000 secs
3.1415537704
real 3m33.129s
user 3m32.925s
0.00 106.93 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z10reportTimePKcNSt6chrono8durationIlSt5ratioILl1ELl1000000EEEE
== = <span style="color: red">✗ Profile 2: Wave Form Generator </span> ===
<s>'''This is the program we selected to optimize. It's a great candidate because it has 2 primary functions that have a few for loops in them. One of the functions reads an Mp3 file and writes wave data to a file -- this function takes quite a bit of time to execute. The other function actually takes this data and converts it to a view-able sound wave image. Both functions would benefit greatly from the extra processing power that a GPU provides: mp3 read/decode time would be greatly reduced.'''</s>
'This piece of code is too complex and requires a linux environment to run. Please see Profile 0 for the one we are currently using.'
* Sample Run
[root@jr-net-cent7 ~]# time audiowaveform -i Steph\ DJ\ -\ Noise\ Control\ Episode\ 025\ Feat\ Jack\ Diamond\ 13th\ January\ 2014.mp3 -o test.dat -z 256 -b 8
Input file: Steph DJ - Noise Control Episode 025 Feat Jack Diamond 13th January 2014.mp3
Format: Audio MPEG layer III stream
Bit rate: 320000 kbit/s
CRC: no
Mode: normal LR stereo
Emphasis: no
Sample rate: 44100 Hz
Generating waveform data...
Samples per pixel: 256
Input channels: 2
Done: 100%
Recoverable frame level error: lost synchronization
Frames decoded: 283540 (123:26.759)
Generated 1275930 points
Writing output file: test.dat
Resolution: 8 bits
real 0m32.486s
user 0m32.409s
sys 0m0.056s
[root@jr-net-cent7 ~]# which audiowaveform
/usr/local/bin/audiowaveform
[root@jr-net-cent7 ~]# gprof -p -b /usr/local/bin/audiowaveform > final.dat
* gprof
Each sample counts as 0.01 seconds.
% cumulative self self total
0.00 7.29 0.00 7272 0.00 0.00 BstdRead
0.00 7.29 0.00 7271 0.00 0.00 BstdFileEofP