Changes

← Older edit

GPU610/DPS915 MCM Decrypt

2,958 bytes added, 18:58, 6 December 2013

→‎Assignment 3

-Currently I am trying to get the above kernels to work before handing in the assignment as I feel that having just the initialization kernel would not be nearly sufficient for the purpose of this assignment.

==== Work With Prefix Scan ====

-At the moment I am working on various simplified versions of a prefix sum algorithm that I am hoping will lead me on the right path to completing my assignment. These algorithms have been gathered from various sources such as MIT, NVIDIA, as well as CUDA documentation.

void scan( float* arr1, float* input, int n) {

output[0] = 0; // since this is ~~a prescan~~an exclusive scan, we do not ~~a scan~~include the first element

for(int i = 1; i < length; ++i) {

arr1[i] = input[i-1] + arr1[i-1];

}

</source>

-Below is a parallel scan which does the same thing as the above function

global__ void scan(float *g_odata, float *g_idata, int n) {

extern __shared__ float temp[]; // allocated on invocation

int thid = threadIdx.x;

int pout = 0, pin = 1;

// load input into shared memory.

// This is exclusive scan, so shift right by one and set first elt to 0

temp[pout*n + thid] = (thid > 0) ? g_idata[thid-1] : 0;

__syncthreads();

for (int offset = 1; offset < n; offset *= 2) {

pout = 1 - pout; // swap double buffer indices

pin = 1 - pout;

if (thid >= offset)

temp[pout*n+thid] += temp[pin*n+thid - offset];

else

temp[pout*n+thid] = temp[pin*n+thid];

__syncthreads();

}

g_odata[thid] = temp[pout*n+thid1]; // write output

}

</source>

=== Assignment 3 ===

-In the end I found a fairly simplified encryption program and decided to work with that so I could at least get something to hand in in the end. Encryption works with every letter of the string or file you are working with, and therefore is a perfect candidate to be parallelized.

-Here is the source code snippit of the CPU code:

void encrypt(char *inp,char *out,int key)

{

std::ifstream input;

std::ofstream output;

char buf;

input.open(inp);

output.open(out);

buf=input.get();

while(!input.eof())

{

if(buf>='a'&&buf<='z') {

buf-='a';

buf+=key;

buf%=26;

buf+='A';

}

else if(buf>='A'&&buf<='Z') {

buf-='A';

buf+=26-key;

buf%=26;

buf+='a';

}

output.put(buf);

buf=input.get();

}

input.close();

output.close();

//readText(inp);

//readText(out);

}

</source>

-I then created a kernel which was very simple

__global__ void encrypt2(char *inp, int key) {

int i = threadIdx.x;

if(inp[i]>='a'&&inp[i]<='z') {

inp[i]-='a';

inp[i]+=key;

inp[i]%=26;

inp[i]+='A';

}

else if(inp[i]>='A'&&inp[i]<='Z') {

inp[i]-='A';

inp[i]+=26-key;

inp[i]%=26;

inp[i]+='a';

}

</source>

-and optimized it to allow for larger sized strings (so all of the threads would not be operating on just one block and leaving the other streaming multiprocessors idle.

__global__ void encrypt2(char *inp, int key) {

int i = blockIdx.x * blockDim.x + threadIdx.x;

if(inp[i]>='a'&&inp[i]<='z') {

inp[i]-='a';

inp[i]+=key;

inp[i]%=26;

inp[i]+='A';

}

else if(inp[i]>='A'&&inp[i]<='Z') {

inp[i]-='A';

inp[i]+=26-key;

inp[i]%=26;

inp[i]+='a';

}

</source>

Matthew Conner Maceachern

1

edit

Changes

GPU610/DPS915 MCM Decrypt

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools