Changes

GPU610/Team AGC

1,271 bytes added, 01:40, 31 October 2014

Initiated Assignment 2 - xor_me

=== Assignment 2 ===

==== Christopher's Findings ====

===== xor_me =====

Since the majority of CPU cycles is spent in the most inner for-loop, the goal is to parallelize this code only. It is possible to parallelize all 8 for-loops (Maximum of 8 character password) because they are performing very similar tasks. However, due to the time constraint and purpose of this assignment, I will only focus on this one. See below for the following code to be optimized for our GPU:

<pre>

for (o=32; o < 128; ++o) {

skipInits:

unsigned short x = nHash ^ hash;

lclRotateRight(x, 1);

if (32 <= x && x < 127) {

t[0] = static_cast<unsigned char>(x);

if (nKey == lclGetKey(t, 16)) {

std::cout << "Password: '" << t << "'" << std::endl;

}

hash ^= r[1];

r[1] = t[1] = o;

lclRotateLeft(r[1], 2);

hash ^= r[1];

if (o == 32) {

r[0] = '\0';

hash = lclGetHash(t, r, 16);

}

</pre>

At first it seems impossible to do because of the data dependencies: hash, r[1], t[1], and o. But after doing some research, I have learned that the XOR operation is communicative and associative, meaning that we can simply calculate all of the dependencies in parallel first, then combine the results to achieve the same data.

=== Assignment 3 ===

Christopher Markieta

70

edits

Changes

GPU610/Team AGC

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools