Changes

Jump to: navigation, search

GPU610/DPS915 MCM Decrypt

3,724 bytes added, 18:58, 6 December 2013
Assignment 3
unsigned long tmp;
tmp= (unsigned long) (unsigned char) password[threadIdx.xidx];
*nr^= (((*nr & 63)+*add)*tmp)+ (*nr << 8);
*nr2+=(*nr2 << 8) ^ *nr;
encrypted_password[threadIdx.x] = 0;
}
 
</source>
 
-Currently I am trying to get the above kernels to work before handing in the assignment as I feel that having just the initialization kernel would not be nearly sufficient for the purpose of this assignment.
 
==== Work With Prefix Scan ====
 
-At the moment I am working on various simplified versions of a prefix sum algorithm that I am hoping will lead me on the right path to completing my assignment. These algorithms have been gathered from various sources such as MIT, NVIDIA, as well as CUDA documentation.
 
-Below is a sequential prescan used to perform a prescan on an array.
 
<source lang="cpp">
 
void scan( float* arr1, float* input, int n) {
output[0] = 0; // since this is an exclusive scan, we do not include the first element
for(int i = 1; i < length; ++i) {
arr1[i] = input[i-1] + arr1[i-1];
}
}
 
</source>
 
-Below is a parallel scan which does the same thing as the above function
 
<source lang="cpp">
 
global__ void scan(float *g_odata, float *g_idata, int n) {
extern __shared__ float temp[]; // allocated on invocation
 
int thid = threadIdx.x;
int pout = 0, pin = 1;
 
// load input into shared memory.
// This is exclusive scan, so shift right by one and set first elt to 0
temp[pout*n + thid] = (thid > 0) ? g_idata[thid-1] : 0;
__syncthreads();
 
for (int offset = 1; offset < n; offset *= 2) {
pout = 1 - pout; // swap double buffer indices
pin = 1 - pout;
 
if (thid >= offset)
temp[pout*n+thid] += temp[pin*n+thid - offset];
else
temp[pout*n+thid] = temp[pin*n+thid];
 
__syncthreads();
}
 
g_odata[thid] = temp[pout*n+thid1]; // write output
}
</source>
=== Assignment 3 ===
-In the end I found a fairly simplified encryption program and decided to work with that so I could at least get something to hand in in the end. Encryption works with every letter of the string or file you are working with, and therefore is a perfect candidate to be parallelized.
-Here is the source code snippit of the CPU code:
 
<source lang=cpp>
void encrypt(char *inp,char *out,int key)
{
std::ifstream input;
std::ofstream output;
char buf;
input.open(inp);
output.open(out);
buf=input.get();
while(!input.eof())
{
if(buf>='a'&&buf<='z') {
buf-='a';
buf+=key;
buf%=26;
buf+='A';
}
else if(buf>='A'&&buf<='Z') {
buf-='A';
buf+=26-key;
buf%=26;
buf+='a';
}
output.put(buf);
buf=input.get();
}
input.close();
output.close();
//readText(inp);
//readText(out);
}
 
</source>
 
-I then created a kernel which was very simple
 
<source lang=cpp>
 
__global__ void encrypt2(char *inp, int key) {
int i = threadIdx.x;
if(inp[i]>='a'&&inp[i]<='z') {
inp[i]-='a';
inp[i]+=key;
inp[i]%=26;
inp[i]+='A';
}
else if(inp[i]>='A'&&inp[i]<='Z') {
inp[i]-='A';
inp[i]+=26-key;
inp[i]%=26;
inp[i]+='a';
}
}
 
</source>
 
-and optimized it to allow for larger sized strings (so all of the threads would not be operating on just one block and leaving the other streaming multiprocessors idle.
 
<source lang=cpp>
 
__global__ void encrypt2(char *inp, int key) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(inp[i]>='a'&&inp[i]<='z') {
inp[i]-='a';
inp[i]+=key;
inp[i]%=26;
inp[i]+='A';
}
else if(inp[i]>='A'&&inp[i]<='Z') {
inp[i]-='A';
inp[i]+=26-key;
inp[i]%=26;
inp[i]+='a';
}
}
</source>

Navigation menu