70
edits
Changes
Initiated Assignment 2 - xor_me
=== Assignment 2 ===
==== Christopher's Findings ====
===== xor_me =====
Since the majority of CPU cycles is spent in the most inner for-loop, the goal is to parallelize this code only. It is possible to parallelize all 8 for-loops (Maximum of 8 character password) because they are performing very similar tasks. However, due to the time constraint and purpose of this assignment, I will only focus on this one. See below for the following code to be optimized for our GPU:
<pre>
for (o=32; o < 128; ++o) {
skipInits:
unsigned short x = nHash ^ hash;
lclRotateRight(x, 1);
if (32 <= x && x < 127) {
t[0] = static_cast<unsigned char>(x);
if (nKey == lclGetKey(t, 16)) {
std::cout << "Password: '" << t << "'" << std::endl;
}
}
hash ^= r[1];
r[1] = t[1] = o;
lclRotateLeft(r[1], 2);
hash ^= r[1];
if (o == 32) {
r[0] = '\0';
hash = lclGetHash(t, r, 16);
}
}
</pre>
At first it seems impossible to do because of the data dependencies: hash, r[1], t[1], and o. But after doing some research, I have learned that the XOR operation is communicative and associative, meaning that we can simply calculate all of the dependencies in parallel first, then combine the results to achieve the same data.
=== Assignment 3 ===