1
edit
Changes
→Description
Inside the byteCipher method, exists a for loop that could use optimization. Within this loop specifically, the lines that call the <code>cycle</code> and <code>rc4_output</code> functions are the ones that are taking the longest time to execute:
for (int i = 0; i < bufferSize; i++){
// going over every byte in the file
}
char cycle (char value) {
int leftMask = 170;
We need to change these two functions so they are added to the CUDA device as "device functions". We also need to convert this for loop into a kernel.
==== Profiling on Linux ====