Changes

Jump to: navigation, search

GPU610/DPS915 MCM Decrypt

2,958 bytes added, 18:58, 6 December 2013
Assignment 3
-Currently I am trying to get the above kernels to work before handing in the assignment as I feel that having just the initialization kernel would not be nearly sufficient for the purpose of this assignment.
 
==== Work With Prefix Scan ====
-At the moment I am working on various simplified versions of a prefix sum algorithm that I am hoping will lead me on the right path to completing my assignment. These algorithms have been gathered from various sources such as MIT, NVIDIA, as well as CUDA documentation.
void scan( float* arr1, float* input, int n) {
output[0] = 0; // since this is a prescanan exclusive scan, we do not a scaninclude the first element
for(int i = 1; i < length; ++i) {
arr1[i] = input[i-1] + arr1[i-1];
}
}
 
</source>
 
-Below is a parallel scan which does the same thing as the above function
 
<source lang="cpp">
 
global__ void scan(float *g_odata, float *g_idata, int n) {
extern __shared__ float temp[]; // allocated on invocation
 
int thid = threadIdx.x;
int pout = 0, pin = 1;
 
// load input into shared memory.
// This is exclusive scan, so shift right by one and set first elt to 0
temp[pout*n + thid] = (thid > 0) ? g_idata[thid-1] : 0;
__syncthreads();
 
for (int offset = 1; offset < n; offset *= 2) {
pout = 1 - pout; // swap double buffer indices
pin = 1 - pout;
 
if (thid >= offset)
temp[pout*n+thid] += temp[pin*n+thid - offset];
else
temp[pout*n+thid] = temp[pin*n+thid];
 
__syncthreads();
}
 
g_odata[thid] = temp[pout*n+thid1]; // write output
}
</source>
=== Assignment 3 ===
-In the end I found a fairly simplified encryption program and decided to work with that so I could at least get something to hand in in the end. Encryption works with every letter of the string or file you are working with, and therefore is a perfect candidate to be parallelized.
-Here is the source code snippit of the CPU code:
 
<source lang=cpp>
void encrypt(char *inp,char *out,int key)
{
std::ifstream input;
std::ofstream output;
char buf;
input.open(inp);
output.open(out);
buf=input.get();
while(!input.eof())
{
if(buf>='a'&&buf<='z') {
buf-='a';
buf+=key;
buf%=26;
buf+='A';
}
else if(buf>='A'&&buf<='Z') {
buf-='A';
buf+=26-key;
buf%=26;
buf+='a';
}
output.put(buf);
buf=input.get();
}
input.close();
output.close();
//readText(inp);
//readText(out);
}
 
</source>
 
-I then created a kernel which was very simple
 
<source lang=cpp>
 
__global__ void encrypt2(char *inp, int key) {
int i = threadIdx.x;
if(inp[i]>='a'&&inp[i]<='z') {
inp[i]-='a';
inp[i]+=key;
inp[i]%=26;
inp[i]+='A';
}
else if(inp[i]>='A'&&inp[i]<='Z') {
inp[i]-='A';
inp[i]+=26-key;
inp[i]%=26;
inp[i]+='a';
}
}
 
</source>
 
-and optimized it to allow for larger sized strings (so all of the threads would not be operating on just one block and leaving the other streaming multiprocessors idle.
 
<source lang=cpp>
 
__global__ void encrypt2(char *inp, int key) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(inp[i]>='a'&&inp[i]<='z') {
inp[i]-='a';
inp[i]+=key;
inp[i]%=26;
inp[i]+='A';
}
else if(inp[i]>='A'&&inp[i]<='Z') {
inp[i]-='A';
inp[i]+=26-key;
inp[i]%=26;
inp[i]+='a';
}
}
</source>

Navigation menu