Changes

← Older edit

GPU610/DPS915 MCM Decrypt

3,724 bytes added, 18:58, 6 December 2013

→‎Assignment 3

unsigned long tmp;

tmp= (unsigned long) (unsigned char) password[~~threadIdx.x~~idx];

*nr^= (((*nr & 63)+*add)*tmp)+ (*nr << 8);

*nr2+=(*nr2 << 8) ^ *nr;

encrypted_password[threadIdx.x] = 0;

}

</source>

-Currently I am trying to get the above kernels to work before handing in the assignment as I feel that having just the initialization kernel would not be nearly sufficient for the purpose of this assignment.

==== Work With Prefix Scan ====

-At the moment I am working on various simplified versions of a prefix sum algorithm that I am hoping will lead me on the right path to completing my assignment. These algorithms have been gathered from various sources such as MIT, NVIDIA, as well as CUDA documentation.

-Below is a sequential prescan used to perform a prescan on an array.

void scan( float* arr1, float* input, int n) {

output[0] = 0; // since this is an exclusive scan, we do not include the first element

for(int i = 1; i < length; ++i) {

arr1[i] = input[i-1] + arr1[i-1];

}

</source>

-Below is a parallel scan which does the same thing as the above function

global__ void scan(float *g_odata, float *g_idata, int n) {

extern __shared__ float temp[]; // allocated on invocation

int thid = threadIdx.x;

int pout = 0, pin = 1;

// load input into shared memory.

// This is exclusive scan, so shift right by one and set first elt to 0

temp[pout*n + thid] = (thid > 0) ? g_idata[thid-1] : 0;

__syncthreads();

for (int offset = 1; offset < n; offset *= 2) {

pout = 1 - pout; // swap double buffer indices

pin = 1 - pout;

if (thid >= offset)

temp[pout*n+thid] += temp[pin*n+thid - offset];

else

temp[pout*n+thid] = temp[pin*n+thid];

__syncthreads();

}

g_odata[thid] = temp[pout*n+thid1]; // write output

}

</source>

=== Assignment 3 ===

-In the end I found a fairly simplified encryption program and decided to work with that so I could at least get something to hand in in the end. Encryption works with every letter of the string or file you are working with, and therefore is a perfect candidate to be parallelized.

-Here is the source code snippit of the CPU code:

void encrypt(char *inp,char *out,int key)

{

std::ifstream input;

std::ofstream output;

char buf;

input.open(inp);

output.open(out);

buf=input.get();

while(!input.eof())

{

if(buf>='a'&&buf<='z') {

buf-='a';

buf+=key;

buf%=26;

buf+='A';

}

else if(buf>='A'&&buf<='Z') {

buf-='A';

buf+=26-key;

buf%=26;

buf+='a';

}

output.put(buf);

buf=input.get();

}

input.close();

output.close();

//readText(inp);

//readText(out);

}

</source>

-I then created a kernel which was very simple

__global__ void encrypt2(char *inp, int key) {

int i = threadIdx.x;

if(inp[i]>='a'&&inp[i]<='z') {

inp[i]-='a';

inp[i]+=key;

inp[i]%=26;

inp[i]+='A';

}

else if(inp[i]>='A'&&inp[i]<='Z') {

inp[i]-='A';

inp[i]+=26-key;

inp[i]%=26;

inp[i]+='a';

}

</source>

-and optimized it to allow for larger sized strings (so all of the threads would not be operating on just one block and leaving the other streaming multiprocessors idle.

__global__ void encrypt2(char *inp, int key) {

int i = blockIdx.x * blockDim.x + threadIdx.x;

if(inp[i]>='a'&&inp[i]<='z') {

inp[i]-='a';

inp[i]+=key;

inp[i]%=26;

inp[i]+='A';

}

else if(inp[i]>='A'&&inp[i]<='Z') {

inp[i]-='A';

inp[i]+=26-key;

inp[i]%=26;

inp[i]+='a';

}

</source>

Matthew Conner Maceachern

1

edit

Changes

GPU610/DPS915 MCM Decrypt

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools