Changes

Jump to: navigation, search

GPU610/OctoPig

3,695 bytes added, 12:57, 9 March 2013
Assignment 2
====Chad's Profiling Findings====
I started with [http://pointersgonewild.wordpress.com/2012/12/31/turing-drawings/ this] blog post I found and the demo that Maxime Chevalier-Boisvert (the original creator) posted as a github page [http://maximecb.github.com/Turing-Drawings/# here]. Make sure to press random a few times to see the different results. She also posted the source code [https://github.com/maximecb/Turing-Drawings here], the only issue was that it's all written in JavaScript. My biggest main task for completely the assignment was to somehow port the code into C++. The biggest challenge while porting was migrating from a loosely typed language to a strongly typed one, but I managed in the end to successfully port and compile the code. I'm sure there's a lot of cleaning up I can do with my C++ code, but this first assignment was more just a proof of concept than anything else.
I decided to scrap the randomization part of the original program for now because I want to have consistent profile results every run. I chose to focus on the seed found [http://maximecb.github.com/Turing-Drawings/#6,4,1,2,0,1,3,1,4,3,3,1,3,0,5,2,1,4,2,0,2,1,1,4,1,0,1,3,0,5,1,3,1,1,3,0,1,1,2,3,3,5,1,1,3,1,0,3,1,2,1,1,1,1,3,3,3,2,2,2,3,3,2,1,3,5,3,0,2,3,2,4,1,1 here]because who doesn't love the Matrix. I also decided to only generate 1000 images each time the program is run.
After profiling I determined that 99% of the run was being spent in two functions, 89% and 10% respectively. It's also interesting to note that the function that took 89% gets called 70000 times. The code for the two functions is below.
<source lang="cpp">
void Program::update(int numItrs) {
for (int i = 0; i < numItrs; ++i)
{
int sy = map[mapWidth * yPos + xPos];
int st = state;
 
int idx = (numStates * sy + st) * 3;
st = table[idx + 0];
sy = table[idx + 1];
int ac = table[idx + 2];
 
// Update the current state
state = st;
 
// Write the new symbol
map[mapWidth * yPos + xPos] = sy;
 
// Perform the transition action
switch (ac)
{
case ACTION_LEFT:
xPos += 1;
if (xPos >= mapWidth)
xPos -= mapWidth;
break;
 
case ACTION_RIGHT:
xPos -= 1;
if (xPos < 0)
xPos += mapWidth;
break;
 
case ACTION_UP:
yPos -= 1;
if (yPos < 0)
yPos += mapHeight;
break;
 
case ACTION_DOWN:
yPos += 1;
if (yPos >= mapHeight)
yPos -= mapHeight;
break;
 
default:
error("invalid action: " + ac);
}
 
itrCount++;
}
}
</source>
 
<source lang="cpp">
void updateRender(tmach::Program* program) {
float startTime = tmach::getTimeMillis();
int startItrc = program->itrCount;
 
// Until the update time is exhausted
do {
// Update the program
program->update(5000);
 
float curTime = tmach::getTimeMillis();
int curItrc = program->itrCount;
 
if (curItrc - startItrc >= UPDATE_ITRS ||
curTime - startTime >= UPDATE_TIME)
break;
} while (true);
 
// Produce the image data
int* map = program->map;
int map_size = program->size_map;
unsigned char data [WIDTH * HEIGHT * 3];
for (int i = 0; i < program->size_map; ++i)
{
int sy = map[i];
 
int r = colorMap[3 * sy + 0];
int g = colorMap[3 * sy + 1];
int b = colorMap[3 * sy + 2];
 
data[3 * i + 0] = r;
data[3 * i + 1] = g;
data[3 * i + 2] = b;
}
 
// Show the image data
tmach::createBMP(WIDTH, HEIGHT, data, WIDTH*HEIGHT*4, "temp.bmp");
}
</source>
 
Because ~90% of the run time is take up by one function it's really the only possible candidate for moving to the GPU and optimizing. Upon closer inspection though you can see that the update function only contains one for loop and the for loop essentially snakes through the image map 1 pixel at a time. I don't think it can be parallelized because each iteration is dependent on the one before. The image creation is reliant on the snaking effect.
'''Analysis:'''
=== Assignment 2 ===
For the second assignment we couldn't choose to do either of our topics for Assignment 1. We decided to do a variation on Conway's Game of Life called SmoothLife. We started with [http://0fps.wordpress.com/2012/11/19/conways-game-of-life-for-curved-surfaces-part-1/ this blog post]. The author goes into great detail about the math behind SmoothLife, but you don't really need to understand it. At the bottom of the post there is a JavaScript implementation of SmoothLife, [http://jsfiddle.net/mikola/aj2vq/ here], and we used that as a starting point.
 
In order to profile the code I ported it over to C++ and ran it on Matrix. The results weren't great. The two functions that consumed the most run time took up a total of 90% of the time, but each run of the function only took 0.01ms. So both of those were out because it wouldn't be worth it. The next two functions took up ~15% runtime combined. We each picked on and moved it to the GPU. We new the results weren't going to be great, but I'm sick of porting JavaScript to C++.
 
Actual source code to follow.
 
=== Assignment 3 ===
1
edit

Navigation menu