Difference between revisions of "A-Team"

From CDOT Wiki
Jump to: navigation, search
(Initial implementation)
(Final Profile)
 
(41 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
=== Assignment 1 ===
 
=== Assignment 1 ===
 
Our group decided to profile a couple of different solutions, the first being a simple neural network and ray tracing solution, in order to determine the best project to generate a solution for.  
 
Our group decided to profile a couple of different solutions, the first being a simple neural network and ray tracing solution, in order to determine the best project to generate a solution for.  
=====Neural Network=====
+
===Neural Network===
 
======Sebastian's findings======
 
======Sebastian's findings======
 
I found a simple [https://gist.github.com/sbugrov/7f373f0e4788f8e076b8efa2abfd227a neural network] that takes a MNIST data set and preforms training on batches of the data. For a quick illustration MNIST is a numerical data set that contains many written numbers --in a gray scale format at 28 x 28 pixels in size. As well as the corresponding numerical values; between 0 and  9. The reason for this data set is to train networks such that they will be able to recognize written numbers when they confront them.
 
I found a simple [https://gist.github.com/sbugrov/7f373f0e4788f8e076b8efa2abfd227a neural network] that takes a MNIST data set and preforms training on batches of the data. For a quick illustration MNIST is a numerical data set that contains many written numbers --in a gray scale format at 28 x 28 pixels in size. As well as the corresponding numerical values; between 0 and  9. The reason for this data set is to train networks such that they will be able to recognize written numbers when they confront them.
Line 96: Line 96:
 
Our Hypothesis for this solution is a acceleration of roughly 10x; when dot() is parallelized. This means that our code should take somewhere in the ball park of 102 seconds to train the network.
 
Our Hypothesis for this solution is a acceleration of roughly 10x; when dot() is parallelized. This means that our code should take somewhere in the ball park of 102 seconds to train the network.
  
=====Ray Tracing=====
+
===Ray Tracing===
 
======Henry's findings======
 
======Henry's findings======
  
Line 102: Line 102:
  
 
======Initial Profile======
 
======Initial Profile======
 +
 +
{| class="wikitable mw-collapsible mw-collapsed"
 +
! Initial Profile (Warning: long)
 +
|-
 +
|     Initial Profile
  
 
Flat profile:
 
Flat profile:
Line 276: Line 281:
 
   0.00    19.10    0.00        1    0.00    0.00  Imager::Spheroid::~Spheroid()
 
   0.00    19.10    0.00        1    0.00    0.00  Imager::Spheroid::~Spheroid()
 
   0.00    19.10    0.00        1    0.00    0.00  Algebra::UnitTest()
 
   0.00    19.10    0.00        1    0.00    0.00  Algebra::UnitTest()
 +
|}
 +
 +
----
  
 
From looking at the flat profile, 43.88% of time is in SolveLinearEquations. Most of the other time is used for calculating the shapes, while 1.02% is in the TraceRay function.
 
From looking at the flat profile, 43.88% of time is in SolveLinearEquations. Most of the other time is used for calculating the shapes, while 1.02% is in the TraceRay function.
Line 282: Line 290:
  
 
======Call Graph======
 
======Call Graph======
 +
{| class="wikitable mw-collapsible mw-collapsed"
 +
! Call Graph
 +
|-
 +
|     Call graph (explanation follows)
 
Call graph
 
Call graph
  
Line 1,428: Line 1,440:
  
 
Most of the time (99.3%) is spent executing the SaveImage function (Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const). In the additional lodepng code that runs alongside the ray tracer, 94.4% of time is spent in the CalculateLighting function (Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const).
 
Most of the time (99.3%) is spent executing the SaveImage function (Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const). In the additional lodepng code that runs alongside the ray tracer, 94.4% of time is spent in the CalculateLighting function (Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const).
 +
|}
 +
 +
----
  
 
=== Assignment 2 ===
 
=== Assignment 2 ===
 
During assignment 2, we tried a simple kernel that took the shape of a dot product, what this achieved was nothing special, actually as predicted at the end of assignment 1, continuously calling cudaMalloc and cudaMemCpy had severe consequences on time.   
 
During assignment 2, we tried a simple kernel that took the shape of a dot product, what this achieved was nothing special, actually as predicted at the end of assignment 1, continuously calling cudaMalloc and cudaMemCpy had severe consequences on time.   
 
====Initial implementation====
 
====Initial implementation====
  [[File:kernel_ms1_call.jpg]]
 
 
   //version 1 dot product
 
   //version 1 dot product
 
  __global__ void kdot(const float* d_a, const float* d_b, float* d_p, int ni, int nj, int nk) {
 
  __global__ void kdot(const float* d_a, const float* d_b, float* d_p, int ni, int nj, int nk) {
Line 1,438: Line 1,452:
 
   int j = blockIdx.y * blockDim.y + threadIdx.y;
 
   int j = blockIdx.y * blockDim.y + threadIdx.y;
 
   //matrix multiplication
 
   //matrix multiplication
      if (i < ni && j < nj) {
+
    if (i < ni && j < nj) {
float sum = 0.0f;
+
        float sum = 0.0f;
for (int k = 0; k < nk; k++)
+
        for (int k = 0; k < nk; k++)
 
           sum += d_a[i * nk + k] * d_b[k * nj + j];
 
           sum += d_a[i * nk + k] * d_b[k * nj + j];
  d_p[i * nj + j] = sum;
+
          d_p[i * nj + j] = sum;
 +
  }
 +
}
 +
 
 +
====Naive====
 +
Naturally this is a naive implementation as we are calling cudaMalloc for each iteration of the training for loop. 
 +
cout << "Training the model ...\n";
 +
for (unsigned i = 0; i < 10000; ++i) {
 +
 
 +
This actually costs us an additional 20 minutes when profiling could be done.
 +
 
 +
====The next steps====
 +
Well firstly we had to engage in research as to understand how the actual neural network was learning; for example why they used relu() function, how back-propagation worked and so much more.
 +
Some additional sites will be included.
 +
 
 +
=====After that and many coffees!=====
 +
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1) {
 +
int BATCH_SIZE = 256;
 +
float lr = .01 / BATCH_SIZE;
 +
kdot<<< 50,51>>>(ktranspose(d_a2, BATCH_SIZE, 64), d_dyhat, 64, BATCH_SIZE, 10, d_dW3);
 +
kdot << <80,32>> >(d_dyhat, ktranspose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
 +
kreluPrime(d_a2, 128 * 64);
 +
for (int i = 0; i < BATCH_SIZE * 10; i++) {
 +
    d_dz2[i] = d_dz2[i] * d_a2[i];
 +
}
 +
kdot << <1024, 32>> >(ktranspose(d_a1, BATCH_SIZE, 128), d_dz2, 128, BATCH_SIZE, 64, d_dW2);
 +
kdot << <512,32>> >(d_dz2, ktranspose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1);
 +
kreluPrime(d_a1, BATCH_SIZE * 784);
 +
for (int i = 0; i < 256 * 64; i++) {
 +
    d_dz1[i] = d_dz1[i] * d_a1[i];
 +
}
 +
kdot <<<512,512,32 >>>(ktranspose(d_b_X, BATCH_SIZE, 784), d_dz1, 784, BATCH_SIZE, 128, d_dW1);
 +
// Updating the parameters
 +
//W3 = W3 - lr * dW3;
 +
for (int i = 0; i < (64*10); i++) {
 +
d_W3[i] = d_W3[i] - lr * d_dW3[i];
 +
}
 +
//W2 = W2 - lr * dW2;
 +
for (int i = 0; i < (128*64); i++) {
 +
    d_W2[i] = d_W2[i] - lr * d_dW2[i];
 +
}
 +
//W1 = W1 - lr * dW1;
 +
for (int i = 0; i < (784*128); i++) {
 +
    d_W1[i] = d_W1[i] - lr * d_dW1[i];
 +
  }
 +
}
 +
 
 +
===Dynamic Parallelism===
 +
Dynamic Parallelism in CUDA allows for the support of kernels to create and synchronize new nested kernels. Additionally, for our use case it also allows us to spend more time on the device to process information quickly without constant cudaMemcpy() or cudaMalloc() calls.
 +
 
 +
{| class="wikitable mw-collapsible mw-collapsed"
 +
! Parent call Child kernel( ... )
 +
|-
 +
|
 +
<syntaxhighlight lang="cpp">
 +
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) {
 +
int BATCH_SIZE = 256;
 +
float lr = 0.01 / BATCH_SIZE;
 +
//backpropagation
 +
d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10);
 +
kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
 +
cudaDeviceSynchronize();
 +
}
 +
 
 +
__global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
 +
int i = blockIdx.x * blockDim.x + threadIdx.x;
 +
int j = blockIdx.y * blockDim.y + threadIdx.y;
 +
//matrix multiplication
 +
if (i < ni && j < nj) {
 +
float sum = 0.0f;
 +
for (int k = 0; k < nk; k++)
 +
sum += d_a[i * nk + k] * d_b[k * nj + j];
 +
d_p[i * nj + j] = sum;
 +
}
 +
}
 +
</syntaxhighlight>
 +
|}
 +
 
 +
===Final Iteration===
 +
{| class="wikitable mw-collapsible mw-collapsed"
 +
! GPU code
 +
|-
 +
|
 +
<syntaxhighlight lang="cpp">
 +
__device__ float* k_difference(const float* m1, const float* m2, const int size) {
 +
/* Returns the difference between the two vectors. */
 +
float* difference = new float[size];
 +
for (int i = 0; i < size; i++) {
 +
difference[i] = m1[i] - m2[i];
 
}
 
}
  }
+
return difference;
 +
}
 +
__device__ float* k_MFV(const float f, const float* m, const int size) {
 +
float* mult = new float[size];
 +
for (int i = 0; i < size; i++) {
 +
mult[i] = f * m[i];
 +
}
 +
return mult;
 +
}
 +
__device__ float* k_MM(float* m1, float* m2, const int m2_size) {
 +
float* product = new float[m2_size];
 +
 
 +
for (int i = 0; i != m2_size; ++i) {
 +
product[i] = m1[i] * m2[i];
 +
};
 +
 
 +
return product;
 +
}
 +
__device__ float* k_transpose(float *m, const int C, const int R) {
 +
 
 +
/*  Returns a transpose matrix of input matrix.
 +
Inputs:
 +
m: vector, input matrix
 +
C: int, number of columns in the input matrix
 +
R: int, number of rows in the input matrix
 +
Output: vector, transpose matrix mT of input matrix m
 +
*/
 +
 
 +
float* mT = new float[C * R];
 +
for (unsigned n = 0; n != C * R; n++) {
 +
unsigned i = n / C;
 +
unsigned j = n % C;
 +
mT[n] = m[R*j + i];
 +
}
 +
 
 +
return mT;
 +
 
 +
//for (int i = 0; i<R; ++i)
 +
// for (int j = 0; j<C; ++j)
 +
// {
 +
// mT[j * C + i] = m[i * R + j];
 +
// }
 +
 
 +
//return mT;
 +
}
 +
__device__ void dkernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
 +
for (int row = 0; row != ni; ++row) {
 +
for (int col = 0; col != nk; ++col) {
 +
d_p[row * nk + col] = 0.f;
 +
for (int k = 0; k != nj; ++k) {
 +
d_p[row * nk + col] += d_a[row * nj + k] * d_b[k * nk + col];
 +
}
 +
}
 +
}
 +
}
 +
//version 1 dot product
 +
__global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
 +
int i = blockIdx.x * blockDim.x + threadIdx.x;
 +
int j = blockIdx.y * blockDim.y + threadIdx.y;
 +
//matrix multiplication
 +
if (i < ni && j < nj) {
 +
float sum = 0.0f;
 +
for (int k = 0; k < nk; k++)
 +
sum += d_a[i * nk + k] * d_b[k * nj + j];
 +
d_p[i * nj + j] = sum;
 +
}
 +
}
 +
void cudaCheck(cudaError_t Error) {
 +
if (Error != cudaSuccess) {
 +
cerr << cudaGetErrorName(Error) << "!";
 +
exit(EXIT_FAILURE);
 +
}
 +
}
 +
 
 +
 
 +
 
 +
__device__ float* k_relu(float* a, int n) {
 +
for (int i = 0; i < n; ++i) {
 +
if (a[i] < 0) {
 +
a[i] = 0.01f;
 +
}
 +
else a[i] = a[i];
 +
}
 +
return a;
 +
}
 +
__device__ float* k_reluPrime(float* a, int n) {
 +
for (int i = 0; i < n; ++i) {
 +
if (a[i] > 0) {
 +
a[i] = 1.0f;
 +
}
 +
else a[i] = 0.0;
 +
}
 +
return a;
 +
}
 +
///activation functions __global__
 +
__global__ void kernel_relu(float* a, int n) {
 +
int i = blockIdx.x * blockDim.x + threadIdx.x;
 +
if(i < n) {
 +
if (a[i] < 0) {
 +
a[i] = 0.01f;
 +
}
 +
else a[i] = a[i];
 +
}
 +
}
 +
__global__ void kernel_reluPrime(float* a, int n) {
 +
int i = blockIdx.x * blockDim.x + threadIdx.x;
 +
if (i < n) {
 +
if (a[i] > 0) {
 +
a[i] = 1.0f;
 +
}
 +
else a[i] = 0.0;
 +
}
 +
}
 +
 
 +
 
 +
 
 +
__device__ void ksoftmax(float *input, int input_len) {
 +
//assert(input != NULL);
 +
//assert(input_len != 0);
 +
int i;
 +
float m;
 +
/* Find maximum value from input array */
 +
m = input[0];
 +
for (i = 1; i < input_len; i++) {
 +
if (input[i] > m) {
 +
m = input[i];
 +
}
 +
}
 +
 
 +
float sum = 0;
 +
for (i = 0; i < input_len; i++) {
 +
sum += expf(input[i] - m);
 +
}
 +
 
 +
for (i = 0; i < input_len; i++) {
 +
input[i] = expf(input[i] - m - log(sum));
 +
 
 +
}
 +
}
 +
 
 +
__device__ void k_sigmoid(float* m1, int size) {
 +
 
 +
/* Returns the value of the sigmoid function f(x) = 1/(1 + e^-x).
 +
Input: m1, a vector.
 +
Output: 1/(1 + e^-x) for every element of the input matrix m1.
 +
*/
 +
for (unsigned i = 0; i != size; ++i) {
 +
m1[i] = 1 / (1 + exp(-m1[i]));
 +
}
 +
}
 +
__global__ void feed_forward(float* d_b_X, float* d_W1, float* d_W2, float* d_W3, float* d_b_Y, float* d_a1, float* d_a2, float* d_yhat, float* d_dyhat) {
 +
int BATCH_SIZE = 256;
 +
float lr = 0.01 / BATCH_SIZE;
 +
float* tempY = new float[256 * 64];
 +
//feed forward
 +
kernel_dot <<<256, 256>>> (d_b_X, d_W1, BATCH_SIZE, 784, 128, d_a1);
 +
cudaDeviceSynchronize();
 +
k_relu(d_a1, BATCH_SIZE * 784);
 +
kernel_dot <<<256, 128>>> (d_a1, d_W2, BATCH_SIZE, 128, 64, d_a2);
 +
cudaDeviceSynchronize();
 +
k_relu(d_a2, BATCH_SIZE * 128);
 +
kernel_dot <<<256, 64>>> (d_a2, d_W3, BATCH_SIZE, 64, 10, d_yhat);
 +
cudaDeviceSynchronize();
 +
ksoftmax(tempY, 10 * 10);
 +
for (int i = 0; i < 100; i++) {
 +
d_yhat[i] = tempY[i];
 +
}
 +
delete[] tempY;
 +
}
 +
 
 +
 
 +
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) {
 +
cudaError_t Error;
 +
int BATCH_SIZE = 256;
 +
float lr = 0.01 / BATCH_SIZE;
 +
//backpropagation
 +
d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10);
 +
kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
 +
cudaDeviceSynchronize();
 +
float* mT = new float[256 * 64 - 1];
 +
for (int i = 0; i < 256; ++i)
 +
for (int j = 0; j < 64; ++j)
 +
{
 +
mT[j * 64 + i] = d_a2[i * 256 + j];
 +
}
 +
kernel_dot <<<(16384 + 256)/64, 64>>> (mT, d_dyhat, 64, BATCH_SIZE, 10, d_dW3);
 +
cudaDeviceSynchronize();
 +
k_reluPrime(d_a2, 256 * 64);
 +
for (int i = 0; i < BATCH_SIZE * 10; i++) {
 +
d_dz2[i] = d_dz2[i] * d_a2[i];
 +
}
 +
mT = new float[256 * 128];
 +
for (int i = 0; i < 256; ++i)
 +
for (int j = 0; j < 128; ++j)
 +
{
 +
mT[j * 128 + i] = d_a1[i * 256 + j];
 +
}
 +
kernel_dot <<<64, 512>>> (mT, d_dz2, 128, BATCH_SIZE, 64, d_dW2);
 +
cudaDeviceSynchronize();
 +
kernel_dot <<<80, 32>>> (d_dz2, k_transpose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1);
 +
cudaDeviceSynchronize();
 +
k_reluPrime(d_a1, BATCH_SIZE * 784);
 +
for (int i = 0; i < 256 * 64; i++) {
 +
d_dz1[i] = d_dz1[i] * d_a1[i];
 +
}
 +
kernel_dot <<<784, 256>>> (d_t, d_dz1, 784, BATCH_SIZE, 128, d_dW1);
 +
cudaDeviceSynchronize();
 +
//// Updating the parameters
 +
////W3 = W3 - lr * dW3;
 +
d_W3 = k_difference(d_W3, k_MFV(lr, d_dW3, 64 * 10), 64 * 10);
 +
//W2 = W2 - lr * dW2;
 +
d_W2 = k_difference(d_W2, k_MFV(lr, d_dW2, 128 * 64), 128 * 64);
 +
////W1 = W1 - lr * dW1;
 +
d_W1 = k_difference(d_W1, k_MFV(lr, d_dW1, 784 * 128), 784 * 128);
 +
for (int i = 0; i < (784 * 128); i++) {
 +
d_W1[i] = d_W1[i] - lr * d_dW1[i];
 +
}
 +
//for (int i = 0; i != 10; ++i) {
 +
// for (int j = 0; j != 10; ++j) {
 +
// printf("%f ", d_W3[i * 10 + j]);
 +
// }
 +
// printf("\n");
 +
//}
 +
//printf("\n");
 +
//for (int i = 0; i != 10; ++i) {
 +
// for (int j = 0; j != 10; ++j) {
 +
// printf("%f ", d_yhat[i * 10 + j]);
 +
// }
 +
// printf("\n");
 +
//}
 +
//printf("\n");
 +
float* dif;
 +
dif = k_difference(d_b_Y, d_yhat, 10 * 10);
 +
float loss = 0.0;
 +
for (unsigned k = 0; k < BATCH_SIZE * 10; ++k) {
 +
loss += dif[k] * dif[k];
 +
}
 +
printf("%f \n", loss / BATCH_SIZE);
 +
 +
Error = cudaGetLastError();
 +
if (Error != cudaSuccess) {
 +
printf("\n %s \n", Error);
 +
}
 +
};
 +
</syntaxhighlight>
 +
|}
 +
===Final Profile===
 +
This final profile is only of 20 iterations as we had errors occur beyond 20 iterations, likely due to naive coding and bad coding practice.
 +
[[File:nnfinalprofile.jpg]]
 +
 
 +
===Compiling===
 +
follow the article to set up visual studios for dynamic parallelism and recommended readings:
 +
 
 +
http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf
 +
 
 +
http://ramblingsofagamedevstudent.blogspot.com/2014/03/set-up-visual-studio-2012-for-cuda.html
  
 
=== Assignment 3 ===
 
=== Assignment 3 ===
 +
====What we would do differently:====
 +
There are many things, one of the major ones is to take on a more manageable task, one with proper documentation and reasoning behind chosen values.

Latest revision as of 23:50, 7 April 2019

Back Propagation Acceleration

Team Members

  1. Sebastian Djurovic, Team Lead and Developer
  2. Henry Leung, Developer and Quality Control
  3. ...

Email All

Progress

Assignment 1

Our group decided to profile a couple of different solutions, the first being a simple neural network and ray tracing solution, in order to determine the best project to generate a solution for.

Neural Network

Sebastian's findings

I found a simple neural network that takes a MNIST data set and preforms training on batches of the data. For a quick illustration MNIST is a numerical data set that contains many written numbers --in a gray scale format at 28 x 28 pixels in size. As well as the corresponding numerical values; between 0 and 9. The reason for this data set is to train networks such that they will be able to recognize written numbers when they confront them.

MnistExamples.png

Initial Profile
Flat profile:
Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls  ns/call  ns/call  name    
97.94    982.46   982.46                             dot(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&, int, int, int)
 1.45    997.05    14.58                             transpose(float*, int, int)
 0.15    998.56     1.51                             operator-(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&)
 0.15   1000.06     1.50                             relu(std::vector<float, std::allocator<float> > const&)
 0.15   1001.55     1.49                             operator*(float, std::vector<float, std::allocator<float> > const&)
 0.07   1002.27     0.72 519195026     1.39     1.39  void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&)
 0.06   1002.91     0.63                             operator*(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&)
 0.05   1003.37     0.46                             reluPrime(std::vector<float, std::allocator<float> > const&)
 0.02   1003.62     0.25                             softmax(std::vector<float, std::allocator<float> > const&, int)
 0.01   1003.75     0.13                             operator/(std::vector<float, std::allocator<float> > const&, float)
 0.01   1003.87     0.12   442679   271.35   271.35  void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&)
 0.01   1003.96     0.09 13107321     6.87     6.87  void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float const&>(float const&)
 0.01   1004.02     0.06                             split(std::string const&, char)
 0.01   1004.08     0.06   462000   130.00   130.00  void std::vector<std::string, std::allocator<std::string> >::_M_emplace_back_aux<std::string const&>(std::string const&)
 0.00   1004.11     0.03                             std::vector<std::string, std::allocator<std::string> >::~vector()
 0.00   1004.12     0.01                             random_vector(int)
 0.00   1004.12     0.00        3     0.00     0.00  std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&)
 0.00   1004.12     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z5printRKSt6vectorIfSaIfEEii

Neuralnet chart.jpg

After the initial profile it is obvious that the dot product function consumes 97.94% of our run time. Additionally, the transpose function also consumes 1.45% which seems messily, however during back propagation transpose is also called, as well as two rectifiers(activation functions), reluPrime and relu. Where reluPrime is a binary activation function.

Relu = f(x) = {0 for x > 0, x otherwise}
ReluPrime = f(x) = {0 for x > 0, 1 otherwise}
Code Snippets
       // Back propagation
       vector<float> dyhat = (yhat - b_y);
       // dW3 = a2.T * dyhat
       vector<float> dW3 = dot(transpose( &a2[0], BATCH_SIZE, 64 ), dyhat, 64, BATCH_SIZE, 10);
       // dz2 = dyhat * W3.T * relu'(a2)
       vector<float> dz2 = dot(dyhat, transpose( &W3[0], 64, 10 ), BATCH_SIZE, 10, 64) * reluPrime(a2);
       // dW2 = a1.T * dz2
       vector<float> dW2 = dot(transpose( &a1[0], BATCH_SIZE, 128 ), dz2, 128, BATCH_SIZE, 64);
       // dz1 = dz2 * W2.T * relu'(a1)
       vector<float> dz1 = dot(dz2, transpose( &W2[0], 128, 64 ), BATCH_SIZE, 64, 128) * reluPrime(a1);
       // dW1 = X.T * dz1
       vector<float> dW1 = dot(transpose( &b_X[0], BATCH_SIZE, 784 ), dz1, 784, BATCH_SIZE, 128);


vector <float> dot (const vector <float>& m1, const vector <float>& m2, const int m1_rows, const int m1_columns, const int m2_columns) { 
   vector <float> output (m1_rows*m2_columns);
   
   for( int row = 0; row != m1_rows; ++row ) {
       for( int col = 0; col != m2_columns; ++col ) {
           output[ row * m2_columns + col ] = 0.f;
           for( int k = 0; k != m1_columns; ++k ) {
               output[ row * m2_columns + col ] += m1[ row * m1_columns + k ] * m2[ k * m2_columns + col ];
           }
       }
   }
   
   return output;
}
Amdahl's law

When Amdahl's law is applied the theoretical speed up is 48.54x, however due to the exception the actual prediction is no more then 10x faster.

Theoretical:

s = 1/(1 - 97.94%)
= 1/(1 - 0.9794)
= 48.54

Prediction:

P = 102s
Possible complications

The main concern when parallelizing these code snippets is that memory copying is going to take up a lot of time, so despite the predicted speed up, there is no certain answer until the Cuda kernel is complete.

Hypothesis

Our Hypothesis for this solution is a acceleration of roughly 10x; when dot() is parallelized. This means that our code should take somewhere in the ball park of 102 seconds to train the network.

Ray Tracing

Henry's findings

I decided to choose a ray tracing program that draws graphics such as a block, cuboid and cylinder. The shapes are rendered with shadows. The program is from http://cosinekitty.com/raytrace.

Initial Profile
Initial Profile (Warning: long)
Initial Profile

Flat profile:

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
43.88      8.38     8.38 406030768     0.00     0.00  Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&)
13.98     11.05     2.67 14003920     0.00     0.00  Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 8.12     12.60     1.55 34580399     0.00     0.00  Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 7.72     14.08     1.48 66701722     0.00     0.00  Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 3.25     14.70     0.62 50859850     0.00     0.00  Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 3.12     15.29     0.60  8534998     0.00     0.00  Imager::Scene::CalculateMatte(Imager::Intersection const&) const
 2.64     15.80     0.51      594     0.00     0.00  encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int)
 1.88     16.16     0.36 118023768     0.00     0.00  Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const
 1.73     16.49     0.33       15     0.02     1.26  Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const
 1.31     16.74     0.25  3262804     0.00     0.00  Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const
 1.26     16.98     0.24  5171493     0.00     0.00  Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*)
 1.07     17.18     0.21 18609329     0.00     0.00  Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const
 1.02     17.38     0.20 18609329     0.00     0.00  Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const
 0.92     17.55     0.18 83146683     0.00     0.00  Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&)
 0.89     17.72     0.17  8573986     0.00     0.00  Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const
 0.86     17.89     0.17 22928551     0.00     0.00  Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const
 0.63     18.01     0.12  1966663     0.00     0.00  Imager::Torus::SurfaceNormal(Imager::Vector const&) const
 0.58     18.12     0.11  7514037     0.00     0.00  Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.42     18.20     0.08  5171490     0.00     0.00  Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const
 0.42     18.28     0.08  3115245     0.00     0.00  Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.34     18.34     0.07                             Imager::Scene::PolarizedReflection(double, double, double, double) const
 0.31     18.40     0.06 11368907     0.00     0.00  Imager::Sphere::Contains(Imager::Vector const&) const
 0.31     18.46     0.06  6856730     0.00     0.00  Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*)
 0.26     18.51     0.05  9484218     0.00     0.00  Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const
 0.26     18.56     0.05  2906525     0.00     0.00  Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const
 0.21     18.60     0.04 12028218     0.00     0.00  Algebra::FilterRealNumbers(int, std::complex<double> const*, double*)
 0.21     18.64     0.04  5171490     0.00     0.00  Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.21     18.68     0.04  3522280     0.00     0.00  Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const
 0.16     18.71     0.03  5171292     0.00     0.00  Algebra::cbrt(std::complex<double>, int)
 0.16     18.74     0.03  5088067     0.00     0.00  string_set(char**, char const*)
 0.16     18.77     0.03   957358     0.00     0.00  Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.16     18.80     0.03    17010     0.00     0.00  addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long)
 0.16     18.83     0.03                             frame_dummy
 0.13     18.86     0.03  3617416     0.00     0.00  Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const
 0.10     18.88     0.02 13944693     0.00     0.00  Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.10     18.90     0.02 11893060     0.00     0.00  Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.10     18.92     0.02  3538132     0.00     0.00  Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const
 0.10     18.94     0.02  1425590     0.00     0.00  Imager::Optics::ValidateReflectionColor(Imager::Color const&) const
 0.10     18.96     0.02        1     0.02     0.02  Imager::SolidObject_Reorientable::RotateZ(double)
 0.08     18.97     0.02   170828     0.00     0.00  Imager::SolidObject::Contains(Imager::Vector const&) const
 0.05     18.98     0.01  5946530     0.00     0.00  Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.05     18.99     0.01  5268088     0.00     0.00  Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.05     19.00     0.01  3682000     0.00     0.00  getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*)
 0.05     19.01     0.01  3170898     0.00     0.00  Imager::SetComplement::Contains(Imager::Vector const&) const
 0.05     19.02     0.01  2953369     0.00     0.00  addBitToStream(unsigned long*, ucvector*, unsigned char)
 0.05     19.03     0.01  2096776     0.00     0.00  Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const
 0.05     19.04     0.01  1425504     0.00     0.00  Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const
 0.05     19.05     0.01   627238     0.00     0.00  color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char)
 0.05     19.06     0.01    77932     0.00     0.00  Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const
 0.05     19.07     0.01     3016     0.00     0.00  sort_coins(Coin*, unsigned long)
 0.05     19.08     0.01       73     0.00     0.00  Imager::SolidObject::Translate(double, double, double)
 0.05     19.09     0.01       15     0.00     0.00  lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int)
 0.03     19.10     0.01                             Imager::SolidObject_BinaryOperator::RotateZ(double)
 0.00     19.10     0.00  2431840     0.00     0.00  Imager::SetUnion::Contains(Imager::Vector const&) const
 0.00     19.10     0.00  1447152     0.00     0.00  uivector_push_back(uivector*, unsigned int)
 0.00     19.10     0.00  1425505     0.00     0.00  Imager::Optics::SetMatteColor(Imager::Color const&)
 0.00     19.10     0.00  1395302     0.00     0.00  Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const
 0.00     19.10     0.00  1137648     0.00     0.00  Imager::ChessBoard::SquareCoordinate(double) const
 0.00     19.10     0.00   738585     0.00     0.00  ucvector_push_back(ucvector*, unsigned char)
 0.00     19.10     0.00   478679     0.00     0.00  Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 0.00     19.10     0.00   406979     0.00     0.00  lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*)
 0.00     19.10     0.00   255938     0.00     0.00  searchCodeIndex(unsigned int const*, unsigned long, unsigned long)
 0.00     19.10     0.00     3245     0.00     0.00  cleanup_coins(Coin*, unsigned long)
 0.00     19.10     0.00     2787     0.00     0.00  append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long)
 0.00     19.10     0.00      729     0.00     0.00  uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64]
 0.00     19.10     0.00      607     0.00     0.00  lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char)
 0.00     19.10     0.00      243     0.00     0.00  lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int)
 0.00     19.10     0.00      243     0.00     0.00  HuffmanTree_cleanup(HuffmanTree*)
 0.00     19.10     0.00      243     0.00     0.00  HuffmanTree_makeFromLengths2(HuffmanTree*)
 0.00     19.10     0.00      243     0.00     0.00  HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int)
 0.00     19.10     0.00      120     0.00     0.00  Imager::Dodecahedron::CheckEdge(int, int, double) const
 0.00     19.10     0.00       92     0.00     0.00  Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&)
 0.00     19.10     0.00       88     0.00     0.00  std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&)
 0.00     19.10     0.00       48     0.00     0.00  lodepng_chunk_generate_crc(unsigned char*)
 0.00     19.10     0.00       48     0.00     0.00  Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62]
 0.00     19.10     0.00       48     0.00     0.00  addUnknownChunks(ucvector*, unsigned char*, unsigned long)
 0.00     19.10     0.00       45     0.00     0.00  lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*)
 0.00     19.10     0.00       45     0.00     0.00  lodepng_info_cleanup(LodePNGInfo*)
 0.00     19.10     0.00       45     0.00     0.00  LodePNGText_cleanup(LodePNGInfo*)
 0.00     19.10     0.00       45     0.00     0.00  lodepng_add32bitInt(ucvector*, unsigned int)
 0.00     19.10     0.00       45     0.00     0.00  LodePNGIText_cleanup(LodePNGInfo*)
 0.00     19.10     0.00       30     0.00     0.00  lodepng_info_init(LodePNGInfo*)
 0.00     19.10     0.00       30     0.00     0.00  checkColorValidity(LodePNGColorType, unsigned int)
 0.00     19.10     0.00       29     0.00     0.00  std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&)
 0.00     19.10     0.00       24     0.00     0.00  Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double)
 0.00     19.10     0.00       22     0.00     0.00  Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>)
 0.00     19.10     0.00       21     0.00     0.00  Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&)
 0.00     19.10     0.00       20     0.00     0.00  Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double)
 0.00     19.10     0.00       20     0.00     0.00  std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&)
 0.00     19.10     0.00       20     0.00     0.00  std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&)
 0.00     19.10     0.00       18     0.00     0.00  Imager::SolidObject_Reorientable::RotateX(double)
 0.00     19.10     0.00       18     0.00     0.00  Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double)
 0.00     19.10     0.00       17     0.00     0.00  std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&)
 0.00     19.10     0.00       15     0.00     0.04  lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*)
 0.00     19.10     0.00       15     0.00     0.02  lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*)
 0.00     19.10     0.00       15     0.00     0.00  lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*)
 0.00     19.10     0.00       15     0.00     0.00  lodepng_state_init(LodePNGState*)
 0.00     19.10     0.00       15     0.00     0.04  lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int)
 0.00     19.10     0.00       15     0.00     0.00  lodepng_state_cleanup(LodePNGState*)
 0.00     19.10     0.00       15     0.00     0.03  lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*)
 0.00     19.10     0.00       15     0.00     0.00  lodepng_can_have_alpha(LodePNGColorMode const*)
 0.00     19.10     0.00       15     0.00     0.00  lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*)
 0.00     19.10     0.00       15     0.00     0.00  zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*)
 0.00     19.10     0.00       15     0.00     0.00  update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61]
 0.00     19.10     0.00       15     0.00     0.00  preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*)
 0.00     19.10     0.00       15     0.00     0.00  Imager::SolidObject_Reorientable::RotateY(double)
 0.00     19.10     0.00       15     0.00     0.00  Imager::Scene::ClearSolidObjectList()
 0.00     19.10     0.00       15     0.00     0.00  Imager::Scene::~Scene()
 0.00     19.10     0.00       15     0.00     0.04  lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int)
 0.00     19.10     0.00       15     0.00     0.00  lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int)
 0.00     19.10     0.00       15     0.00     0.04  lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int)
 0.00     19.10     0.00       15     0.00     0.00  lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&)
 0.00     19.10     0.00       15     0.00     0.00  void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag)
 0.00     19.10     0.00       14     0.00     0.00  Imager::SolidObject_BinaryOperator::Translate(double, double, double)
 0.00     19.10     0.00       13     0.00     0.00  Imager::Sphere::~Sphere()
 0.00     19.10     0.00       10     0.00     0.00  Imager::SolidObject_BinaryOperator::RotateY(double)
 0.00     19.10     0.00        9     0.00     0.00  Imager::SolidObject_BinaryOperator::RotateX(double)
 0.00     19.10     0.00        8     0.00     0.00  Imager::SetComplement::Translate(double, double, double)
 0.00     19.10     0.00        7     0.00     0.00  Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*)
 0.00     19.10     0.00        6     0.00     0.00  Imager::Optics::SetOpacity(double)
 0.00     19.10     0.00        5     0.00     0.00  Imager::Torus::~Torus()
 0.00     19.10     0.00        5     0.00     0.00  Imager::Sphere::RotateY(double)
 0.00     19.10     0.00        4     0.00     0.00  Imager::Cuboid::~Cuboid()
 0.00     19.10     0.00        4     0.00     0.00  Imager::SetUnion::~SetUnion()
 0.00     19.10     0.00        3     0.00     0.00  Imager::TriangleMesh::RotateX(double)
 0.00     19.10     0.00        3     0.00     0.00  Imager::TriangleMesh::RotateY(double)
 0.00     19.10     0.00        3     0.00     0.00  Imager::SetComplement::RotateY(double)
 0.00     19.10     0.00        3     0.00     0.00  Imager::SetComplement::~SetComplement()
 0.00     19.10     0.00        3     0.00     0.00  Imager::Sphere::RotateX(double)
 0.00     19.10     0.00        3     0.00     0.00  Imager::ThinRing::~ThinRing()
 0.00     19.10     0.00        3     0.00     0.00  Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>)
 0.00     19.10     0.00        2     0.00     1.27  TorusTest(char const*, double)
 0.00     19.10     0.00        2     0.00     0.00  Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&)
 0.00     19.10     0.00        2     0.00     0.00  Imager::Dodecahedron::~Dodecahedron()
 0.00     19.10     0.00        2     0.00     0.00  Imager::SetComplement::RotateX(double)
 0.00     19.10     0.00        2     0.00     0.00  Imager::SetDifference::~SetDifference()
 0.00     19.10     0.00        2     0.00     0.00  Imager::SetIntersection::~SetIntersection()
 0.00     19.10     0.00        2     0.00     0.00  Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*)
 0.00     19.10     0.00        2     0.00     0.00  Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>)
 0.00     19.10     0.00        2     0.00     0.00  Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>)
 0.00     19.10     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z9BlockTestv
 0.00     19.10     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN6Imager5Scene20ClearSolidObjectListEv
 0.00     19.10     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN6Imager6IndentERSoi
 0.00     19.10     0.00        1     0.00     0.00  _GLOBAL__sub_I__ZN7Algebra20SolveLinearEquationsEddddddddddddRdS0_S0_
 0.00     19.10     0.00        1     0.00     1.26  CuboidTest()
 0.00     19.10     0.00        1     0.00     1.27  SaturnTest()
 0.00     19.10     0.00        1     0.00     1.27  BitDonutTest()
 0.00     19.10     0.00        1     0.00     1.26  CylinderTest()
 0.00     19.10     0.00        1     0.00     1.26  SpheroidTest()
 0.00     19.10     0.00        1     0.00     1.26  PolyhedraTest()
 0.00     19.10     0.00        1     0.00     1.28  ChessBoardTest()
 0.00     19.10     0.00        1     0.00     1.27  SetDifferenceTest()
 0.00     19.10     0.00        1     0.00     1.26  MultipleSphereTest()
 0.00     19.10     0.00        1     0.00     1.27  SetIntersectionTest()
 0.00     19.10     0.00        1     0.00     1.26  DodecahedronOverlapTest()
 0.00     19.10     0.00        1     0.00     1.27  BlockTest()
 0.00     19.10     0.00        1     0.00     0.00  Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&)
 0.00     19.10     0.00        1     0.00     0.00  Imager::ChessBoard::~ChessBoard()
 0.00     19.10     0.00        1     0.00     0.00  Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&)
 0.00     19.10     0.00        1     0.00     0.00  Imager::Icosahedron::~Icosahedron()
 0.00     19.10     0.00        1     0.00     0.00  Imager::ConcreteBlock::~ConcreteBlock()
 0.00     19.10     0.00        1     0.00     0.00  Imager::Optics::SetGlossColor(Imager::Color const&)
 0.00     19.10     0.00        1     0.00     0.00  Imager::Planet::~Planet()
 0.00     19.10     0.00        1     0.00     0.00  Imager::Saturn::CreateRingSystem()
 0.00     19.10     0.00        1     0.00     0.00  Imager::Saturn::~Saturn()
 0.00     19.10     0.00        1     0.00     0.00  Imager::Cylinder::~Cylinder()
 0.00     19.10     0.00        1     0.00     0.00  Imager::Spheroid::~Spheroid()
 0.00     19.10     0.00        1     0.00     0.00  Algebra::UnitTest()

From looking at the flat profile, 43.88% of time is in SolveLinearEquations. Most of the other time is used for calculating the shapes, while 1.02% is in the TraceRay function.

Raytraceblock.png]

Call Graph
Call Graph
Call graph (explanation follows)

Call graph


granularity: each sample hit covers 2 byte(s) for 0.05% of 19.10 seconds

index % time self children called name

               0.02    1.24       1/15          SphereTest() [26]
               0.02    1.24       1/15          CuboidTest() [22]
               0.02    1.24       1/15          SetDifferenceTest() [21]
               0.02    1.24       1/15          CylinderTest() [23]
               0.02    1.24       1/15          SpheroidTest() [24]
               0.02    1.24       1/15          SetIntersectionTest() [19]
               0.02    1.24       1/15          MultipleSphereTest() [25]
               0.02    1.24       1/15          PolyhedraTest() [27]
               0.02    1.24       1/15          BitDonutTest() [20]
               0.02    1.24       1/15          SaturnTest() [18]
               0.02    1.24       1/15          DodecahedronOverlapTest() [28]
               0.02    1.24       1/15          BlockTest() [17]
               0.02    1.24       1/15          ChessBoardTest() [16]
               0.04    2.48       2/15          TorusTest(char const*, double) [12]

[1] 99.3 0.33 18.64 15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]

               0.67   17.36 12440000/12440000     Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]
               0.00    0.62      15/15          lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33]
               0.00    0.00      15/15          lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int) [147]

[2] 94.4 0.67 17.36 12440000+20912644 <cycle 4 as a whole> [2]

               0.17    9.87 8573986             Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]
               0.20    7.37 18609329             Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]
               0.25    0.12 3262804             Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]
               0.05    0.00 2906525             Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58]

                                                <spontaneous>

[3] 92.9 0.00 17.73 UnitTests() [3]

               0.00    2.53       2/2           TorusTest(char const*, double) [12]
               0.00    1.28       1/1           ChessBoardTest() [16]
               0.00    1.27       1/1           BlockTest() [17]
               0.00    1.27       1/1           SaturnTest() [18]
               0.00    1.27       1/1           SetIntersectionTest() [19]
               0.00    1.27       1/1           BitDonutTest() [20]
               0.00    1.27       1/1           SetDifferenceTest() [21]
               0.00    1.26       1/1           CylinderTest() [23]
               0.00    1.26       1/1           CuboidTest() [22]
               0.00    1.26       1/1           SpheroidTest() [24]
               0.00    1.26       1/1           MultipleSphereTest() [25]
               0.00    1.26       1/1           DodecahedronOverlapTest() [28]
               0.00    1.26       1/1           PolyhedraTest() [27]
               0.00    0.00       1/1           Algebra::UnitTest() [91]

               0.03    0.10  170828/14003920     Imager::SolidObject::Contains(Imager::Vector const&) const [48]
               0.04    0.12  205218/14003920     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]
               1.10    3.46 5760000/14003920     Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]
               1.50    4.72 7867874/14003920     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]

[4] 58.0 2.67 8.40 14003920 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]

               8.38    0.00 406030768/406030768     Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&) [8]
               0.02    0.00 3538132/3538132     Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const [70]
               0.00    0.00       8/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

                            8573986             Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]

[5] 52.6 0.17 9.87 8573986 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

               0.60    9.17 8534998/8534998     Imager::Scene::CalculateMatte(Imager::Intersection const&) const [6]
               0.04    0.04 3522280/3522280     Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53]
               0.03    0.00 3617416/3617416     Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const [66]
               0.00    0.00 1395302/1395302     Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const [105]
                            3262804             Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]
                            2906525             Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58]

               0.60    9.17 8534998/8534998     Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

[6] 51.1 0.60 9.17 8534998 Imager::Scene::CalculateMatte(Imager::Intersection const&) const [6]

               0.17    9.00 22928551/22928551     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]

               0.17    9.00 22928551/22928551     Imager::Scene::CalculateMatte(Imager::Intersection const&) const [6]

[7] 48.0 0.17 9.00 22928551 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]

               1.50    4.72 7867874/14003920     Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]
               0.19    0.83 15801057/50859850     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]
               0.84    0.00 38169893/66701722     Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14]
               0.00    0.30 2698530/5946530     Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30]
               0.00    0.30 2698530/11893060     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]
               0.00    0.16 2486753/13944693     Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29]
               0.14    0.00 64537354/83146683     Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [47]

               8.38    0.00 406030768/406030768     Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]

[8] 43.9 8.38 0.00 406030768 Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&) [8]


                            2906525             Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58]
                            3262804             Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]
               0.67   17.36 12440000/12440000     Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]

[9] 39.6 0.20 7.37 18609329 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]

               0.21    7.13 18609329/18609329     Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]
               0.04    0.00 18609329/83146683     Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [47]
                            8573986             Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

               0.21    7.13 18609329/18609329     Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]

[10] 38.4 0.21 7.13 18609329 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]

               1.10    3.46 5760000/14003920     Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]
               0.15    0.64 12094646/50859850     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]
               0.57    0.00 25863441/66701722     Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14]
               0.01    0.47 7280621/13944693     Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29]
               0.01    0.37 3248000/5946530     Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30]
               0.01    0.36 3248000/11893060     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]

               0.05    0.22 4177319/50859850     Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35]
               0.06    0.25 4842135/50859850     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]
               0.15    0.64 12094646/50859850     Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]
               0.17    0.73 13944693/50859850     Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29]
               0.19    0.83 15801057/50859850     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]

[11] 17.2 0.62 2.67 50859850 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]

               1.55    0.33 34580399/34580399     Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13]
               0.04    0.49 5171490/5171490     Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37]
               0.08    0.04 3115245/3115245     Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50]
               0.11    0.00 7514037/7514037     Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [51]
               0.00    0.04  478679/478679      Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60]

               0.00    2.53       2/2           UnitTests() [3]

[12] 13.3 0.00 2.53 2 TorusTest(char const*, double) [12]

               0.04    2.48       2/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       2/6           Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00       2/7           Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00       2/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       4/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       4/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       4/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       2/18          Imager::SolidObject_Reorientable::RotateX(double) [133]
               0.00    0.00       2/15          Imager::Scene::~Scene() [146]
               0.00    0.00       2/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]

               1.55    0.33 34580399/34580399     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]

[13] 9.9 1.55 0.33 34580399 Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13]

               0.33    0.00 108617482/118023768     Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const [42]
               0.00    0.00       9/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

               0.02    0.00 1090769/66701722     Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35]
               0.03    0.00 1577619/66701722     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]
               0.57    0.00 25863441/66701722     Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]
               0.84    0.00 38169893/66701722     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]

[14] 7.7 1.48 0.00 66701722 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14]

               0.00    0.00      24/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

               0.00    0.30 2698530/11893060     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]
               0.01    0.36 3248000/11893060     Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]
               0.01    0.66 5946530/11893060     Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30]

[15] 7.0 0.02 1.32 11893060 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]

               0.01    0.57 5268088/5268088     Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35]
               0.06    0.25 4842135/50859850     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]
               0.04    0.12  205218/14003920     Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]
               0.02    0.14  170828/170828      Imager::SolidObject::Contains(Imager::Vector const&) const [48]
               0.01    0.04 3170898/3170898     Imager::SetComplement::Contains(Imager::Vector const&) const [57]
               0.03    0.00 1577619/66701722     Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14]
               0.01    0.01 2068572/9484218     Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52]
               0.01    0.00 1497631/11368907     Imager::Sphere::Contains(Imager::Vector const&) const [55]
               0.00    0.00      23/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

               0.00    1.28       1/1           UnitTests() [3]

[16] 6.7 0.00 1.28 1 ChessBoardTest() [16]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.02    0.00       1/1           Imager::SolidObject_Reorientable::RotateZ(double) [69]
               0.00    0.00       1/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       3/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       3/6           Imager::Optics::SetOpacity(double) [152]
               0.00    0.00       3/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       3/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       1/1           Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&) [172]
               0.00    0.00       1/18          Imager::SolidObject_Reorientable::RotateX(double) [133]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]

               0.00    1.27       1/1           UnitTests() [3]

[17] 6.6 0.00 1.27 1 BlockTest() [17]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/6           Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00       1/7           Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00       1/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       2/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       1/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       1/1425505     Imager::Optics::SetMatteColor(Imager::Color const&) [72]
               0.00    0.00       1/1           Imager::Optics::SetGlossColor(Imager::Color const&) [97]
               0.00    0.00       3/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       2/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       1/6           Imager::Optics::SetOpacity(double) [152]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]

               0.00    1.27       1/1           UnitTests() [3]

[18] 6.6 0.00 1.27 1 SaturnTest() [18]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/6           Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00       1/7           Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00       1/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       1/1           Imager::Saturn::CreateRingSystem() [92]
               0.00    0.00       1/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]

               0.00    1.27       1/1           UnitTests() [3]

[19] 6.6 0.00 1.27 1 SetIntersectionTest() [19]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/6           Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00       1/7           Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00       1/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       2/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       1/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]

               0.00    1.27       1/1           UnitTests() [3]

[20] 6.6 0.00 1.27 1 BitDonutTest() [20]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/6           Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00       1/7           Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00       1/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       2/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]

               0.00    1.27       1/1           UnitTests() [3]

[21] 6.6 0.00 1.27 1 SetDifferenceTest() [21]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/7           Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00       1/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       2/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]

               0.00    1.26       1/1           UnitTests() [3]

[22] 6.6 0.00 1.26 1 CuboidTest() [22]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       1/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       1/18          Imager::SolidObject_Reorientable::RotateX(double) [133]
               0.00    0.00       1/15          Imager::SolidObject_Reorientable::RotateY(double) [144]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       1/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]

               0.00    1.26       1/1           UnitTests() [3]

[23] 6.6 0.00 1.26 1 CylinderTest() [23]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       1/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       1/18          Imager::SolidObject_Reorientable::RotateX(double) [133]
               0.00    0.00       1/15          Imager::SolidObject_Reorientable::RotateY(double) [144]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       1/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]

               0.00    1.26       1/1           UnitTests() [3]

[24] 6.6 0.00 1.26 1 SpheroidTest() [24]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       2/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       1/18          Imager::SolidObject_Reorientable::RotateX(double) [133]
               0.00    0.00       1/15          Imager::SolidObject_Reorientable::RotateY(double) [144]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]

               0.00    1.26       1/1           UnitTests() [3]

[25] 6.6 0.00 1.26 1 MultipleSphereTest() [25]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       2/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       3/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       2/6           Imager::Optics::SetOpacity(double) [152]
               0.00    0.00       2/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]

                                                <spontaneous>

[26] 6.6 0.00 1.26 SphereTest() [26]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       1/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       1/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]

               0.00    1.26       1/1           UnitTests() [3]

[27] 6.6 0.00 1.26 1 PolyhedraTest() [27]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       3/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       2/3           Imager::TriangleMesh::RotateY(double) [158]
               0.00    0.00       2/3           Imager::TriangleMesh::RotateX(double) [157]
               0.00    0.00       2/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]
               0.00    0.00       1/1           Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174]
               0.00    0.00       1/2           Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]

               0.00    1.26       1/1           UnitTests() [3]

[28] 6.6 0.00 1.26 1 DodecahedronOverlapTest() [28]

               0.02    1.24       1/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]
               0.00    0.00       3/29          std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]
               0.00    0.00       1/2           Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163]
               0.00    0.00       1/3           Imager::TriangleMesh::RotateX(double) [157]
               0.00    0.00       1/3           Imager::TriangleMesh::RotateY(double) [158]
               0.00    0.00       1/15          Imager::Scene::~Scene() [146]
               0.00    0.00       1/20          std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]

               0.00    0.16 2486753/13944693     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]
               0.01    0.27 4177319/13944693     Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35]
               0.01    0.47 7280621/13944693     Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]

[29] 4.8 0.02 0.90 13944693 Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29]

               0.17    0.73 13944693/50859850     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]

               0.00    0.30 2698530/5946530     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]
               0.01    0.37 3248000/5946530     Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10]

[30] 3.6 0.01 0.67 5946530 Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30]

               0.01    0.66 5946530/11893060     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]

               0.00    0.62      15/15          lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32]

[31] 3.2 0.00 0.62 15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

               0.00    0.51      15/15          lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38]
               0.01    0.05    3721/17010       addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45]
               0.01    0.01      15/15          lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67]
               0.01    0.00  627238/627238      color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char) [76]
               0.00    0.01      30/45          lodepng_add32bitInt(ucvector*, unsigned int) [73]
               0.01    0.00 1841000/3682000     getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) [74]
               0.00    0.00     195/738585      ucvector_push_back(ucvector*, unsigned char) [39]
               0.00    0.00     607/607         lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char) [112]
               0.00    0.00      48/48          addUnknownChunks(ucvector*, unsigned char*, unsigned long) [120]
               0.00    0.00      45/45          lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [121]
               0.00    0.00      30/30          checkColorValidity(LodePNGColorType, unsigned int) [126]
               0.00    0.00      15/30          lodepng_info_init(LodePNGInfo*) [125]
               0.00    0.00      15/15          lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136]
               0.00    0.00      15/15          lodepng_can_have_alpha(LodePNGColorMode const*) [139]
               0.00    0.00      15/406979      lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107]
               0.00    0.00      15/45          lodepng_info_cleanup(LodePNGInfo*) [122]
               0.00    0.00      15/15          preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*) [143]
               0.00    0.00      15/15          zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [141]

               0.00    0.62      15/15          lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34]

[32] 3.2 0.00 0.62 15 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32]

               0.00    0.62      15/15          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]
               0.00    0.00      15/15          lodepng_state_init(LodePNGState*) [137]
               0.00    0.00      15/45          lodepng_info_cleanup(LodePNGInfo*) [122]
               0.00    0.00      15/15          lodepng_state_cleanup(LodePNGState*) [138]

               0.00    0.62      15/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]

[33] 3.2 0.00 0.62 15 lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33]

               0.00    0.62      15/15          lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34]
               0.00    0.00      15/15          lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&) [148]

               0.00    0.62      15/15          lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33]

[34] 3.2 0.00 0.62 15 lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34]

               0.00    0.62      15/15          lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32]
               0.00    0.00      15/15          void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag) [149]

               0.01    0.57 5268088/5268088     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]

[35] 3.0 0.01 0.57 5268088 Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35]

               0.01    0.27 4177319/13944693     Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29]
               0.05    0.22 4177319/50859850     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]
               0.02    0.00 1090769/66701722     Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14]

               0.01    0.00      15/594         lodepng_add32bitInt(ucvector*, unsigned int) [73]
               0.07    0.00      81/594         lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]
               0.42    0.03     498/594         ucvector_push_back(ucvector*, unsigned char) [39]

[36] 2.8 0.51 0.03 594 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36]

               0.03    0.00 5088067/5088067     string_set(char**, char const*) [64]
               0.00    0.00  652842/1447152     uivector_push_back(uivector*, unsigned int) [104]
               0.00    0.00  255938/255938      searchCodeIndex(unsigned int const*, unsigned long, unsigned long) [108]

               0.04    0.49 5171490/5171490     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]

[37] 2.8 0.04 0.49 5171490 Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37]

               0.08    0.29 5171490/5171490     Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41]
               0.12    0.00 1966663/1966663     Imager::Torus::SurfaceNormal(Imager::Vector const&) const [49]
               0.00    0.00      13/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

               0.00    0.51      15/15          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[38] 2.7 0.00 0.51 15 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38]

               0.00    0.28      15/15          lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]
               0.00    0.22  369210/738585      ucvector_push_back(ucvector*, unsigned char) [39]
               0.00    0.00      15/45          lodepng_add32bitInt(ucvector*, unsigned int) [73]
               0.00    0.00      15/15          update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61] [142]

               0.00    0.00     195/738585      lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]
               0.00    0.22  369180/738585      addBitToStream(unsigned long*, ucvector*, unsigned char) [46]
               0.00    0.22  369210/738585      lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38]

[39] 2.3 0.00 0.45 738585 ucvector_push_back(ucvector*, unsigned char) [39]

               0.42    0.03     498/594         encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36]

                            3262804             Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

[40] 1.9 0.25 0.12 3262804 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]

               0.05    0.00 9132218/11368907     Imager::Sphere::Contains(Imager::Vector const&) const [55]
               0.02    0.01 3203808/9484218     Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52]
               0.03    0.00 3262804/6856730     Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56]
               0.01    0.00 3262804/12028218     Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59]
                            3262804             Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]

               0.08    0.29 5171490/5171490     Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37]

[41] 1.9 0.08 0.29 5171490 Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41]

               0.24    0.03 5171490/5171493     Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44]
               0.02    0.00 5171490/12028218     Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59]

               0.03    0.00 9406286/118023768     Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52]
               0.33    0.00 108617482/118023768     Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13]

[42] 1.9 0.36 0.00 118023768 Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const [42]


               0.00    0.28      15/15          lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38]

[43] 1.5 0.00 0.28 15 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]

               0.02    0.17   12676/17010       addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45]
               0.07    0.00      81/594         encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36]
               0.00    0.01     243/243         HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) [80]
               0.00    0.00    9043/2953369     addBitToStream(unsigned long*, ucvector*, unsigned char) [46]
               0.00    0.00   39141/1447152     uivector_push_back(uivector*, unsigned int) [104]
               0.00    0.00     243/729         uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111]
               0.00    0.00     243/243         HuffmanTree_cleanup(HuffmanTree*) [113]
               0.00    0.00     243/243         HuffmanTree_makeFromLengths2(HuffmanTree*) [114]
               0.00    0.00      81/406979      lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107]

               0.00    0.00       3/5171493     Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93]
               0.24    0.03 5171490/5171493     Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41]

[44] 1.4 0.24 0.03 5171493 Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44]

               0.03    0.00 5171286/5171292     Algebra::cbrt(std::complex<double>, int) [62]

                            3050925             addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45]
               0.00    0.01     613/17010       lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67]
               0.01    0.05    3721/17010       lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]
               0.02    0.17   12676/17010       lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]

[45] 1.4 0.03 0.23 17010+3050925 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45]

               0.01    0.22 2944326/2953369     addBitToStream(unsigned long*, ucvector*, unsigned char) [46]
                            3050925             addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45]

               0.00    0.00    9043/2953369     lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]
               0.01    0.22 2944326/2953369     addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45]

[46] 1.2 0.01 0.22 2953369 addBitToStream(unsigned long*, ucvector*, unsigned char) [46]

               0.00    0.22  369180/738585      ucvector_push_back(ucvector*, unsigned char) [39]

               0.04    0.00 18609329/83146683     Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]
               0.14    0.00 64537354/83146683     Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7]

[47] 0.9 0.18 0.00 83146683 Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [47]


               0.02    0.14  170828/170828      Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]

[48] 0.8 0.02 0.14 170828 Imager::SolidObject::Contains(Imager::Vector const&) const [48]

               0.03    0.10  170828/14003920     Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]

               0.12    0.00 1966663/1966663     Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37]

[49] 0.6 0.12 0.00 1966663 Imager::Torus::SurfaceNormal(Imager::Vector const&) const [49]


               0.08    0.04 3115245/3115245     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]

[50] 0.6 0.08 0.04 3115245 Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50]

               0.03    0.00 3115245/6856730     Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56]
               0.01    0.00 3115245/12028218     Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59]
               0.00    0.00       6/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

               0.11    0.00 7514037/7514037     Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]

[51] 0.6 0.11 0.00 7514037 Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [51]

               0.00    0.00       2/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

               0.01    0.01 1779998/9484218     Imager::SetComplement::Contains(Imager::Vector const&) const [57]
               0.01    0.01 2068572/9484218     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]
               0.01    0.01 2431840/9484218     Imager::SetUnion::Contains(Imager::Vector const&) const [68]
               0.02    0.01 3203808/9484218     Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]

[52] 0.5 0.05 0.04 9484218 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52]

               0.03    0.00 9406286/118023768     Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const [42]
               0.01    0.00   77932/77932       Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const [77]

               0.04    0.04 3522280/3522280     Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

[53] 0.4 0.04 0.04 3522280 Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53]

               0.01    0.02 1425504/1425504     Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65]
               0.01    0.00 2096776/2096776     Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [75]

                                                <spontaneous>

[54] 0.3 0.07 0.00 Imager::Scene::PolarizedReflection(double, double, double, double) const [54]


               0.00    0.00  739058/11368907     Imager::SetComplement::Contains(Imager::Vector const&) const [57]
               0.01    0.00 1497631/11368907     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]
               0.05    0.00 9132218/11368907     Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]

[55] 0.3 0.06 0.00 11368907 Imager::Sphere::Contains(Imager::Vector const&) const [55]


               0.00    0.00       2/6856730     Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96]
               0.00    0.00  478679/6856730     Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60]
               0.03    0.00 3115245/6856730     Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50]
               0.03    0.00 3262804/6856730     Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]

[56] 0.3 0.06 0.00 6856730 Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56]


               0.01    0.04 3170898/3170898     Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]

[57] 0.3 0.01 0.04 3170898 Imager::SetComplement::Contains(Imager::Vector const&) const [57]

               0.00    0.02 2431840/2431840     Imager::SetUnion::Contains(Imager::Vector const&) const [68]
               0.01    0.01 1779998/9484218     Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52]
               0.00    0.00  739058/11368907     Imager::Sphere::Contains(Imager::Vector const&) const [55]

                            2906525             Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

[58] 0.3 0.05 0.00 2906525 Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58]

                            2906525             Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9]

               0.00    0.00  478679/12028218     Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60]
               0.01    0.00 3115245/12028218     Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50]
               0.01    0.00 3262804/12028218     Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40]
               0.02    0.00 5171490/12028218     Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41]

[59] 0.2 0.04 0.00 12028218 Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59]


               0.00    0.04  478679/478679      Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11]

[60] 0.2 0.00 0.04 478679 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60]

               0.03    0.00  957358/957358      Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [63]
               0.00    0.00  478679/6856730     Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56]
               0.00    0.00  478679/12028218     Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59]
               0.00    0.00       1/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

                                                <spontaneous>

[61] 0.2 0.03 0.00 frame_dummy [61]


               0.00    0.00       6/5171292     Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [94]
               0.03    0.00 5171286/5171292     Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44]

[62] 0.2 0.03 0.00 5171292 Algebra::cbrt(std::complex<double>, int) [62]


               0.03    0.00  957358/957358      Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60]

[63] 0.2 0.03 0.00 957358 Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [63]

               0.00    0.00       2/88          std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]

               0.03    0.00 5088067/5088067     encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36]

[64] 0.2 0.03 0.00 5088067 string_set(char**, char const*) [64]


               0.01    0.02 1425504/1425504     Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53]

[65] 0.2 0.01 0.02 1425504 Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65]

               0.00    0.02 1425504/1425505     Imager::Optics::SetMatteColor(Imager::Color const&) [72]
               0.00    0.00 1137648/1137648     Imager::ChessBoard::SquareCoordinate(double) const [106]

               0.03    0.00 3617416/3617416     Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

[66] 0.1 0.03 0.00 3617416 Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const [66]


               0.01    0.01      15/15          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[67] 0.1 0.01 0.01 15 lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67]

               0.00    0.01     613/17010       addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45]
               0.01    0.00 1841000/3682000     getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) [74]
               0.00    0.00      15/406979      lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107]

               0.00    0.02 2431840/2431840     Imager::SetComplement::Contains(Imager::Vector const&) const [57]

[68] 0.1 0.00 0.02 2431840 Imager::SetUnion::Contains(Imager::Vector const&) const [68]

               0.01    0.01 2431840/9484218     Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52]

               0.02    0.00       1/1           ChessBoardTest() [16]

[69] 0.1 0.02 0.00 1 Imager::SolidObject_Reorientable::RotateZ(double) [69]


               0.02    0.00 3538132/3538132     Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]

[70] 0.1 0.02 0.00 3538132 Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const [70]


               0.00    0.00       1/1425590     Imager::Optics::SetGlossColor(Imager::Color const&) [97]
               0.00    0.00      84/1425590     Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]
               0.02    0.00 1425505/1425590     Imager::Optics::SetMatteColor(Imager::Color const&) [72]

[71] 0.1 0.02 0.00 1425590 Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71]


               0.00    0.00       1/1425505     BlockTest() [17]
               0.00    0.02 1425504/1425505     Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65]

[72] 0.1 0.00 0.02 1425505 Imager::Optics::SetMatteColor(Imager::Color const&) [72]

               0.02    0.00 1425505/1425590     Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71]

               0.00    0.00      15/45          lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38]
               0.00    0.01      30/45          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[73] 0.1 0.00 0.01 45 lodepng_add32bitInt(ucvector*, unsigned int) [73]

               0.01    0.00      15/594         encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36]

               0.01    0.00 1841000/3682000     lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67]
               0.01    0.00 1841000/3682000     lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[74] 0.1 0.01 0.00 3682000 getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) [74]


               0.01    0.00 2096776/2096776     Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53]

[75] 0.1 0.01 0.00 2096776 Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [75]


               0.01    0.00  627238/627238      lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[76] 0.1 0.01 0.00 627238 color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char) [76]


               0.01    0.00   77932/77932       Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52]

[77] 0.1 0.01 0.00 77932 Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const [77]


               0.01    0.00    3016/3016        lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79]

[78] 0.1 0.01 0.00 3016 sort_coins(Coin*, unsigned long) [78]


               0.00    0.01     243/243         HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) [80]

[79] 0.1 0.00 0.01 243 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79]

               0.01    0.00    3016/3016        sort_coins(Coin*, unsigned long) [78]
               0.00    0.00  573312/1447152     uivector_push_back(uivector*, unsigned int) [104]
               0.00    0.00  167321/406979      lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107]
               0.00    0.00    3245/3245        cleanup_coins(Coin*, unsigned long) [109]
               0.00    0.00    2787/2787        append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) [110]

               0.00    0.01     243/243         lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]

[80] 0.1 0.00 0.01 243 HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) [80]

               0.00    0.01     243/243         lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79]

               0.00    0.00       1/73          CuboidTest() [22]
               0.00    0.00       1/73          CylinderTest() [23]
               0.00    0.00       1/73          SpheroidTest() [24]
               0.00    0.00       1/73          ChessBoardTest() [16]
               0.00    0.00       2/73          BlockTest() [17]
               0.00    0.00       4/73          TorusTest(char const*, double) [12]
               0.00    0.00       5/73          Imager::SetComplement::Translate(double, double, double) <cycle 1> [89]
               0.00    0.00      14/73          Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00      15/73          Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00      29/73          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]

[81] 0.1 0.01 0.00 73 Imager::SolidObject::Translate(double, double, double) [81]


                                                <spontaneous>

[82] 0.0 0.01 0.00 Imager::SolidObject_BinaryOperator::RotateZ(double) [82]


[83] 0.0 0.00 0.00 16+6 <cycle 1 as a whole> [83]

               0.00    0.00      14+4           Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       8             Imager::SetComplement::Translate(double, double, double) <cycle 1> [89]

                                  4             Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
                                  3             Imager::SetComplement::Translate(double, double, double) <cycle 1> [89]
               0.00    0.00       1/16          SetDifferenceTest() [21]
               0.00    0.00       1/16          SetIntersectionTest() [19]
               0.00    0.00       1/16          BitDonutTest() [20]
               0.00    0.00       1/16          SaturnTest() [18]
               0.00    0.00       1/16          BlockTest() [17]
               0.00    0.00       2/16          Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00       2/16          Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00       2/16          TorusTest(char const*, double) [12]

[84] 0.0 0.00 0.00 14+4 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]

               0.00    0.00      29/73          Imager::SolidObject::Translate(double, double, double) [81]
                                  3             Imager::SetComplement::Translate(double, double, double) <cycle 1> [89]
                                  4             Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]

[85] 0.0 0.00 0.00 7+26 <cycle 3 as a whole> [85]

               0.00    0.00      10             Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]
               0.00    0.00      20             Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130]
               0.00    0.00       3             Imager::SetComplement::RotateY(double) <cycle 3> [159]

                                  1             Imager::SetComplement::RotateY(double) <cycle 3> [159]
                                  2             Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130]
               0.00    0.00       1/7           SetDifferenceTest() [21]
               0.00    0.00       1/7           SetIntersectionTest() [19]
               0.00    0.00       1/7           BitDonutTest() [20]
               0.00    0.00       1/7           SaturnTest() [18]
               0.00    0.00       1/7           BlockTest() [17]
               0.00    0.00       2/7           TorusTest(char const*, double) [12]

[86] 0.0 0.00 0.00 10 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]

               0.00    0.00      15/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       3/16          Imager::SetComplement::Translate(double, double, double) <cycle 1> [89]
               0.00    0.00       2/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
                                 20             Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130]

[87] 0.0 0.00 0.00 6+23 <cycle 2 as a whole> [87]

               0.00    0.00       9             Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00      18             Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134]
               0.00    0.00       2             Imager::SetComplement::RotateX(double) <cycle 2> [165]

                                  1             Imager::SetComplement::RotateX(double) <cycle 2> [165]
                                  2             Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134]
               0.00    0.00       1/6           SetIntersectionTest() [19]
               0.00    0.00       1/6           BitDonutTest() [20]
               0.00    0.00       1/6           SaturnTest() [18]
               0.00    0.00       1/6           BlockTest() [17]
               0.00    0.00       2/6           TorusTest(char const*, double) [12]

[88] 0.0 0.00 0.00 9 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]

               0.00    0.00      14/73          Imager::SolidObject::Translate(double, double, double) [81]
               0.00    0.00       2/16          Imager::SetComplement::Translate(double, double, double) <cycle 1> [89]
               0.00    0.00       2/16          Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
                                 18             Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134]

                                  3             Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]
               0.00    0.00       2/16          Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]
               0.00    0.00       3/16          Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]

[89] 0.0 0.00 0.00 8 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89]

               0.00    0.00       5/73          Imager::SolidObject::Translate(double, double, double) [81]
                                  3             Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84]

               0.00    0.00       1/21          SphereTest() [26]
               0.00    0.00       1/21          CuboidTest() [22]
               0.00    0.00       1/21          CylinderTest() [23]
               0.00    0.00       1/21          SaturnTest() [18]
               0.00    0.00       1/21          BlockTest() [17]
               0.00    0.00       2/21          SetDifferenceTest() [21]
               0.00    0.00       2/21          SetIntersectionTest() [19]
               0.00    0.00       2/21          MultipleSphereTest() [25]
               0.00    0.00       3/21          ChessBoardTest() [16]
               0.00    0.00       3/21          Imager::Saturn::CreateRingSystem() [92]
               0.00    0.00       4/21          TorusTest(char const*, double) [12]

[90] 0.0 0.00 0.00 21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]

               0.00    0.00      84/1425590     Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71]

               0.00    0.00       1/1           UnitTests() [3]

[91] 0.0 0.00 0.00 1 Algebra::UnitTest() [91]

               0.00    0.00       3/3           Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93]
               0.00    0.00       2/2           Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95]
               0.00    0.00       2/2           Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96]

               0.00    0.00       1/1           SaturnTest() [18]

[92] 0.0 0.00 0.00 1 Imager::Saturn::CreateRingSystem() [92]

               0.00    0.00       3/21          Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90]

               0.00    0.00       3/3           Algebra::UnitTest() [91]

[93] 0.0 0.00 0.00 3 Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93]

               0.00    0.00       3/5171493     Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44]
               0.00    0.00      12/22          Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129]
               0.00    0.00       3/7           Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151]

               0.00    0.00       2/2           Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95]

[94] 0.0 0.00 0.00 2 Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [94]

               0.00    0.00       6/5171292     Algebra::cbrt(std::complex<double>, int) [62]

               0.00    0.00       2/2           Algebra::UnitTest() [91]

[95] 0.0 0.00 0.00 2 Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95]

               0.00    0.00       2/2           Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [94]
               0.00    0.00       6/22          Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129]
               0.00    0.00       2/7           Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151]

               0.00    0.00       2/2           Algebra::UnitTest() [91]

[96] 0.0 0.00 0.00 2 Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96]

               0.00    0.00       2/6856730     Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56]
               0.00    0.00       4/22          Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129]
               0.00    0.00       2/7           Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151]

               0.00    0.00       1/1           BlockTest() [17]

[97] 0.0 0.00 0.00 1 Imager::Optics::SetGlossColor(Imager::Color const&) [97]

               0.00    0.00       1/1425590     Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71]

               0.00    0.00   39141/1447152     lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]
               0.00    0.00  181857/1447152     append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) [110]
               0.00    0.00  573312/1447152     lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79]
               0.00    0.00  652842/1447152     encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36]

[104] 0.0 0.00 0.00 1447152 uivector_push_back(uivector*, unsigned int) [104]

               0.00    0.00  238818/406979      lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107]

               0.00    0.00 1395302/1395302     Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5]

[105] 0.0 0.00 0.00 1395302 Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const [105]


               0.00    0.00 1137648/1137648     Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65]

[106] 0.0 0.00 0.00 1137648 Imager::ChessBoard::SquareCoordinate(double) const [106]


               0.00    0.00      15/406979      lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67]
               0.00    0.00      15/406979      lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]
               0.00    0.00      81/406979      lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]
               0.00    0.00     729/406979      uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111]
               0.00    0.00  167321/406979      lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79]
               0.00    0.00  238818/406979      uivector_push_back(uivector*, unsigned int) [104]

[107] 0.0 0.00 0.00 406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107]


               0.00    0.00  255938/255938      encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36]

[108] 0.0 0.00 0.00 255938 searchCodeIndex(unsigned int const*, unsigned long, unsigned long) [108]


               0.00    0.00    3245/3245        lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79]

[109] 0.0 0.00 0.00 3245 cleanup_coins(Coin*, unsigned long) [109]


               0.00    0.00    2787/2787        lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79]

[110] 0.0 0.00 0.00 2787 append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) [110]

               0.00    0.00  181857/1447152     uivector_push_back(uivector*, unsigned int) [104]

               0.00    0.00     243/729         lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]
               0.00    0.00     486/729         HuffmanTree_makeFromLengths2(HuffmanTree*) [114]

[111] 0.0 0.00 0.00 729 uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111]

               0.00    0.00     729/406979      lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107]

               0.00    0.00     607/607         lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[112] 0.0 0.00 0.00 607 lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char) [112]


               0.00    0.00     243/243         lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]

[113] 0.0 0.00 0.00 243 HuffmanTree_cleanup(HuffmanTree*) [113]


               0.00    0.00     243/243         lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43]

[114] 0.0 0.00 0.00 243 HuffmanTree_makeFromLengths2(HuffmanTree*) [114]

               0.00    0.00     486/729         uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111]

               0.00    0.00     120/120         Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128]

[115] 0.0 0.00 0.00 120 Imager::Dodecahedron::CheckEdge(int, int, double) const [115]


               0.00    0.00      20/92          Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174]
               0.00    0.00      24/92          Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163]
               0.00    0.00      48/92          Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128]

[116] 0.0 0.00 0.00 92 Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116]

               0.00    0.00      20/20          std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&) [131]

               0.00    0.00       1/88          Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60]
               0.00    0.00       2/88          Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [63]
               0.00    0.00       2/88          Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [51]
               0.00    0.00       6/88          Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50]
               0.00    0.00       8/88          Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4]
               0.00    0.00       9/88          Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13]
               0.00    0.00      13/88          Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37]
               0.00    0.00      23/88          Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15]
               0.00    0.00      24/88          Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14]

[117] 0.0 0.00 0.00 88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117]


               0.00    0.00      48/48          lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [121]

[118] 0.0 0.00 0.00 48 lodepng_chunk_generate_crc(unsigned char*) [118]

               0.00    0.00      48/48          Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62] [119]

               0.00    0.00      48/48          lodepng_chunk_generate_crc(unsigned char*) [118]

[119] 0.0 0.00 0.00 48 Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62] [119]


               0.00    0.00      48/48          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[120] 0.0 0.00 0.00 48 addUnknownChunks(ucvector*, unsigned char*, unsigned long) [120]


               0.00    0.00      45/45          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[121] 0.0 0.00 0.00 45 lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [121]

               0.00    0.00      48/48          lodepng_chunk_generate_crc(unsigned char*) [118]

               0.00    0.00      15/45          lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136]
               0.00    0.00      15/45          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]
               0.00    0.00      15/45          lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32]

[122] 0.0 0.00 0.00 45 lodepng_info_cleanup(LodePNGInfo*) [122]

               0.00    0.00      45/45          LodePNGText_cleanup(LodePNGInfo*) [123]
               0.00    0.00      45/45          LodePNGIText_cleanup(LodePNGInfo*) [124]

               0.00    0.00      45/45          lodepng_info_cleanup(LodePNGInfo*) [122]

[123] 0.0 0.00 0.00 45 LodePNGText_cleanup(LodePNGInfo*) [123]


               0.00    0.00      45/45          lodepng_info_cleanup(LodePNGInfo*) [122]

[124] 0.0 0.00 0.00 45 LodePNGIText_cleanup(LodePNGInfo*) [124]


               0.00    0.00      15/30          lodepng_state_init(LodePNGState*) [137]
               0.00    0.00      15/30          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[125] 0.0 0.00 0.00 30 lodepng_info_init(LodePNGInfo*) [125]


               0.00    0.00      30/30          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[126] 0.0 0.00 0.00 30 checkColorValidity(LodePNGColorType, unsigned int) [126]


               0.00    0.00       1/29          SphereTest() [26]
               0.00    0.00       1/29          CuboidTest() [22]
               0.00    0.00       1/29          SetDifferenceTest() [21]
               0.00    0.00       1/29          CylinderTest() [23]
               0.00    0.00       1/29          SetIntersectionTest() [19]
               0.00    0.00       1/29          SaturnTest() [18]
               0.00    0.00       2/29          SpheroidTest() [24]
               0.00    0.00       2/29          BitDonutTest() [20]
               0.00    0.00       3/29          MultipleSphereTest() [25]
               0.00    0.00       3/29          PolyhedraTest() [27]
               0.00    0.00       3/29          DodecahedronOverlapTest() [28]
               0.00    0.00       3/29          BlockTest() [17]
               0.00    0.00       3/29          ChessBoardTest() [16]
               0.00    0.00       4/29          TorusTest(char const*, double) [12]

[127] 0.0 0.00 0.00 29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127]


               0.00    0.00      24/24          Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163]

[128] 0.0 0.00 0.00 24 Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128]

               0.00    0.00     120/120         Imager::Dodecahedron::CheckEdge(int, int, double) const [115]
               0.00    0.00      48/92          Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116]

               0.00    0.00       4/22          Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96]
               0.00    0.00       6/22          Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95]
               0.00    0.00      12/22          Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93]

[129] 0.0 0.00 0.00 22 Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129]


                                 20             Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]

[130] 0.0 0.00 0.00 20 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130]

               0.00    0.00      12/15          Imager::SolidObject_Reorientable::RotateY(double) [144]
               0.00    0.00       3/5           Imager::Sphere::RotateY(double) [154]
                                  3             Imager::SetComplement::RotateY(double) <cycle 3> [159]
                                  2             Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]

               0.00    0.00      20/20          Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116]

[131] 0.0 0.00 0.00 20 std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&) [131]


               0.00    0.00       1/20          SphereTest() [26]
               0.00    0.00       1/20          CuboidTest() [22]
               0.00    0.00       1/20          SetDifferenceTest() [21]
               0.00    0.00       1/20          CylinderTest() [23]
               0.00    0.00       1/20          SpheroidTest() [24]
               0.00    0.00       1/20          SetIntersectionTest() [19]
               0.00    0.00       1/20          BitDonutTest() [20]
               0.00    0.00       1/20          SaturnTest() [18]
               0.00    0.00       1/20          DodecahedronOverlapTest() [28]
               0.00    0.00       2/20          MultipleSphereTest() [25]
               0.00    0.00       2/20          PolyhedraTest() [27]
               0.00    0.00       2/20          BlockTest() [17]
               0.00    0.00       2/20          TorusTest(char const*, double) [12]
               0.00    0.00       3/20          ChessBoardTest() [16]

[132] 0.0 0.00 0.00 20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132]


               0.00    0.00       1/18          CuboidTest() [22]
               0.00    0.00       1/18          CylinderTest() [23]
               0.00    0.00       1/18          SpheroidTest() [24]
               0.00    0.00       1/18          ChessBoardTest() [16]
               0.00    0.00       2/18          TorusTest(char const*, double) [12]
               0.00    0.00      12/18          Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134]

[133] 0.0 0.00 0.00 18 Imager::SolidObject_Reorientable::RotateX(double) [133]


                                 18             Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]

[134] 0.0 0.00 0.00 18 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134]

               0.00    0.00      12/18          Imager::SolidObject_Reorientable::RotateX(double) [133]
               0.00    0.00       2/3           Imager::Sphere::RotateX(double) [161]
                                  2             Imager::SetComplement::RotateX(double) <cycle 2> [165]
                                  2             Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]

               0.00    0.00       5/17          Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174]
               0.00    0.00      12/17          Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163]

[135] 0.0 0.00 0.00 17 std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) [135]


               0.00    0.00      15/15          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[136] 0.0 0.00 0.00 15 lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136]

               0.00    0.00      15/45          lodepng_info_cleanup(LodePNGInfo*) [122]
               0.00    0.00      15/15          lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*) [140]

               0.00    0.00      15/15          lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32]

[137] 0.0 0.00 0.00 15 lodepng_state_init(LodePNGState*) [137]

               0.00    0.00      15/30          lodepng_info_init(LodePNGInfo*) [125]

               0.00    0.00      15/15          lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32]

[138] 0.0 0.00 0.00 15 lodepng_state_cleanup(LodePNGState*) [138]


               0.00    0.00      15/15          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[139] 0.0 0.00 0.00 15 lodepng_can_have_alpha(LodePNGColorMode const*) [139]


               0.00    0.00      15/15          lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136]

[140] 0.0 0.00 0.00 15 lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*) [140]


               0.00    0.00      15/15          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[141] 0.0 0.00 0.00 15 zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [141]


               0.00    0.00      15/15          lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38]

[142] 0.0 0.00 0.00 15 update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61] [142]


               0.00    0.00      15/15          lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31]

[143] 0.0 0.00 0.00 15 preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*) [143]


               0.00    0.00       1/15          CuboidTest() [22]
               0.00    0.00       1/15          CylinderTest() [23]
               0.00    0.00       1/15          SpheroidTest() [24]
               0.00    0.00      12/15          Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130]

[144] 0.0 0.00 0.00 15 Imager::SolidObject_Reorientable::RotateY(double) [144]


               0.00    0.00      15/15          Imager::Scene::~Scene() [146]

[145] 0.0 0.00 0.00 15 Imager::Scene::ClearSolidObjectList() [145]

               0.00    0.00       7/13          Imager::Sphere::~Sphere() [150]
               0.00    0.00       2/2           Imager::SetIntersection::~SetIntersection() [167]
               0.00    0.00       2/2           Imager::SetDifference::~SetDifference() [166]
               0.00    0.00       2/4           Imager::SetUnion::~SetUnion() [156]
               0.00    0.00       1/4           Imager::Cuboid::~Cuboid() [155]
               0.00    0.00       1/1           Imager::Spheroid::~Spheroid() [180]
               0.00    0.00       1/1           Imager::Cylinder::~Cylinder() [179]
               0.00    0.00       1/1           Imager::ConcreteBlock::~ConcreteBlock() [176]
               0.00    0.00       1/1           Imager::Saturn::~Saturn() [178]
               0.00    0.00       1/2           Imager::Dodecahedron::~Dodecahedron() [164]
               0.00    0.00       1/1           Imager::Icosahedron::~Icosahedron() [175]
               0.00    0.00       1/1           Imager::ChessBoard::~ChessBoard() [173]

               0.00    0.00       1/15          SphereTest() [26]
               0.00    0.00       1/15          CuboidTest() [22]
               0.00    0.00       1/15          SetDifferenceTest() [21]
               0.00    0.00       1/15          CylinderTest() [23]
               0.00    0.00       1/15          SpheroidTest() [24]
               0.00    0.00       1/15          SetIntersectionTest() [19]
               0.00    0.00       1/15          MultipleSphereTest() [25]
               0.00    0.00       1/15          PolyhedraTest() [27]
               0.00    0.00       1/15          BitDonutTest() [20]
               0.00    0.00       1/15          SaturnTest() [18]
               0.00    0.00       1/15          DodecahedronOverlapTest() [28]
               0.00    0.00       1/15          BlockTest() [17]
               0.00    0.00       1/15          ChessBoardTest() [16]
               0.00    0.00       2/15          TorusTest(char const*, double) [12]

[146] 0.0 0.00 0.00 15 Imager::Scene::~Scene() [146]

               0.00    0.00      15/15          Imager::Scene::ClearSolidObjectList() [145]

               0.00    0.00      15/15          Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1]

[147] 0.0 0.00 0.00 15 lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int) [147]


               0.00    0.00      15/15          lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33]

[148] 0.0 0.00 0.00 15 lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&) [148]


               0.00    0.00      15/15          lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34]

[149] 0.0 0.00 0.00 15 void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag) [149]


               0.00    0.00       1/13          Imager::SetDifference::~SetDifference() [166]
               0.00    0.00       2/13          Imager::SetComplement::~SetComplement() [160]
               0.00    0.00       3/13          Imager::SetIntersection::~SetIntersection() [167]
               0.00    0.00       7/13          Imager::Scene::ClearSolidObjectList() [145]

[150] 0.0 0.00 0.00 13 Imager::Sphere::~Sphere() [150]


               0.00    0.00       2/7           Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96]
               0.00    0.00       2/7           Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95]
               0.00    0.00       3/7           Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93]

[151] 0.0 0.00 0.00 7 Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151]


               0.00    0.00       1/6           BlockTest() [17]
               0.00    0.00       2/6           MultipleSphereTest() [25]
               0.00    0.00       3/6           ChessBoardTest() [16]

[152] 0.0 0.00 0.00 6 Imager::Optics::SetOpacity(double) [152]


               0.00    0.00       1/5           Imager::SetDifference::~SetDifference() [166]
               0.00    0.00       4/5           Imager::SetUnion::~SetUnion() [156]

[153] 0.0 0.00 0.00 5 Imager::Torus::~Torus() [153]


               0.00    0.00       2/5           Imager::SetComplement::RotateY(double) <cycle 3> [159]
               0.00    0.00       3/5           Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130]

[154] 0.0 0.00 0.00 5 Imager::Sphere::RotateY(double) [154]


               0.00    0.00       1/4           Imager::ConcreteBlock::~ConcreteBlock() [176]
               0.00    0.00       1/4           Imager::Scene::ClearSolidObjectList() [145]
               0.00    0.00       2/4           Imager::SetUnion::~SetUnion() [156]

[155] 0.0 0.00 0.00 4 Imager::Cuboid::~Cuboid() [155]


                                  1             Imager::SetUnion::~SetUnion() [156]
               0.00    0.00       1/4           Imager::Saturn::~Saturn() [178]
               0.00    0.00       1/4           Imager::SetComplement::~SetComplement() [160]
               0.00    0.00       2/4           Imager::Scene::ClearSolidObjectList() [145]

[156] 0.0 0.00 0.00 4+1 Imager::SetUnion::~SetUnion() [156]

               0.00    0.00       4/5           Imager::Torus::~Torus() [153]
               0.00    0.00       3/3           Imager::ThinRing::~ThinRing() [162]
               0.00    0.00       2/4           Imager::Cuboid::~Cuboid() [155]
                                  1             Imager::SetUnion::~SetUnion() [156]

               0.00    0.00       1/3           DodecahedronOverlapTest() [28]
               0.00    0.00       2/3           PolyhedraTest() [27]

[157] 0.0 0.00 0.00 3 Imager::TriangleMesh::RotateX(double) [157]


               0.00    0.00       1/3           DodecahedronOverlapTest() [28]
               0.00    0.00       2/3           PolyhedraTest() [27]

[158] 0.0 0.00 0.00 3 Imager::TriangleMesh::RotateY(double) [158]


                                  3             Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130]

[159] 0.0 0.00 0.00 3 Imager::SetComplement::RotateY(double) <cycle 3> [159]

               0.00    0.00       2/5           Imager::Sphere::RotateY(double) [154]
                                  1             Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86]

               0.00    0.00       1/3           Imager::ConcreteBlock::~ConcreteBlock() [176]
               0.00    0.00       2/3           Imager::SetDifference::~SetDifference() [166]

[160] 0.0 0.00 0.00 3 Imager::SetComplement::~SetComplement() [160]

               0.00    0.00       2/13          Imager::Sphere::~Sphere() [150]
               0.00    0.00       1/4           Imager::SetUnion::~SetUnion() [156]

               0.00    0.00       1/3           Imager::SetComplement::RotateX(double) <cycle 2> [165]
               0.00    0.00       2/3           Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134]

[161] 0.0 0.00 0.00 3 Imager::Sphere::RotateX(double) [161]


               0.00    0.00       3/3           Imager::SetUnion::~SetUnion() [156]

[162] 0.0 0.00 0.00 3 Imager::ThinRing::~ThinRing() [162]


               0.00    0.00       1/2           PolyhedraTest() [27]
               0.00    0.00       1/2           DodecahedronOverlapTest() [28]

[163] 0.0 0.00 0.00 2 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163]

               0.00    0.00      24/92          Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116]
               0.00    0.00      24/24          Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128]
               0.00    0.00      12/17          std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) [135]

               0.00    0.00       1/2           Imager::Scene::ClearSolidObjectList() [145]
               0.00    0.00       1/2           Imager::SetIntersection::~SetIntersection() [167]

[164] 0.0 0.00 0.00 2 Imager::Dodecahedron::~Dodecahedron() [164]


                                  2             Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134]

[165] 0.0 0.00 0.00 2 Imager::SetComplement::RotateX(double) <cycle 2> [165]

               0.00    0.00       1/3           Imager::Sphere::RotateX(double) [161]
                                  1             Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88]

               0.00    0.00       2/2           Imager::Scene::ClearSolidObjectList() [145]

[166] 0.0 0.00 0.00 2 Imager::SetDifference::~SetDifference() [166]

               0.00    0.00       2/3           Imager::SetComplement::~SetComplement() [160]
               0.00    0.00       1/13          Imager::Sphere::~Sphere() [150]
               0.00    0.00       1/5           Imager::Torus::~Torus() [153]

               0.00    0.00       2/2           Imager::Scene::ClearSolidObjectList() [145]

[167] 0.0 0.00 0.00 2 Imager::SetIntersection::~SetIntersection() [167]

               0.00    0.00       3/13          Imager::Sphere::~Sphere() [150]
               0.00    0.00       1/2           Imager::Dodecahedron::~Dodecahedron() [164]

               0.00    0.00       1/1           __libc_csu_init [325]

[168] 0.0 0.00 0.00 1 _GLOBAL__sub_I__Z9BlockTestv [168]


               0.00    0.00       1/1           __libc_csu_init [325]

[169] 0.0 0.00 0.00 1 _GLOBAL__sub_I__ZN6Imager5Scene20ClearSolidObjectListEv [169]


               0.00    0.00       1/1           __libc_csu_init [325]

[170] 0.0 0.00 0.00 1 _GLOBAL__sub_I__ZN6Imager6IndentERSoi [170]


               0.00    0.00       1/1           __libc_csu_init [325]

[171] 0.0 0.00 0.00 1 _GLOBAL__sub_I__ZN7Algebra20SolveLinearEquationsEddddddddddddRdS0_S0_ [171]


               0.00    0.00       1/1           ChessBoardTest() [16]

[172] 0.0 0.00 0.00 1 Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&) [172]


               0.00    0.00       1/1           Imager::Scene::ClearSolidObjectList() [145]

[173] 0.0 0.00 0.00 1 Imager::ChessBoard::~ChessBoard() [173]


               0.00    0.00       1/1           PolyhedraTest() [27]

[174] 0.0 0.00 0.00 1 Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174]

               0.00    0.00      20/92          Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116]
               0.00    0.00       5/17          std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) [135]

               0.00    0.00       1/1           Imager::Scene::ClearSolidObjectList() [145]

[175] 0.0 0.00 0.00 1 Imager::Icosahedron::~Icosahedron() [175]


               0.00    0.00       1/1           Imager::Scene::ClearSolidObjectList() [145]

[176] 0.0 0.00 0.00 1 Imager::ConcreteBlock::~ConcreteBlock() [176]

               0.00    0.00       1/4           Imager::Cuboid::~Cuboid() [155]
               0.00    0.00       1/3           Imager::SetComplement::~SetComplement() [160]

               0.00    0.00       1/1           Imager::Saturn::~Saturn() [178]

[177] 0.0 0.00 0.00 1 Imager::Planet::~Planet() [177]


               0.00    0.00       1/1           Imager::Scene::ClearSolidObjectList() [145]

[178] 0.0 0.00 0.00 1 Imager::Saturn::~Saturn() [178]

               0.00    0.00       1/1           Imager::Planet::~Planet() [177]
               0.00    0.00       1/4           Imager::SetUnion::~SetUnion() [156]

               0.00    0.00       1/1           Imager::Scene::ClearSolidObjectList() [145]

[179] 0.0 0.00 0.00 1 Imager::Cylinder::~Cylinder() [179]


               0.00    0.00       1/1           Imager::Scene::ClearSolidObjectList() [145]

[180] 0.0 0.00 0.00 1 Imager::Spheroid::~Spheroid() [180]


� Index by function name

[168] _GLOBAL__sub_I__Z9BlockTestv (main.cpp) [80] HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) (lodepng.cpp) [91] Algebra::UnitTest()
[169] _GLOBAL__sub_I__ZN6Imager5Scene20ClearSolidObjectListEv (scene.cpp) [172] Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&) [33] lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int)
[170] _GLOBAL__sub_I__ZN6Imager6IndentERSoi (debug.cpp) [173] Imager::ChessBoard::~ChessBoard() [147] lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int)
[171] _GLOBAL__sub_I__ZN7Algebra20SolveLinearEquationsEddddddddddddRdS0_S0_ (algebra.cpp) [174] Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [34] lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int)
 [22] CuboidTest()          [175] Imager::Icosahedron::~Icosahedron() [148] lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&)
 [18] SaturnTest()           [81] Imager::SolidObject::Translate(double, double, double) [106] Imager::ChessBoard::SquareCoordinate(double) const
 [20] BitDonutTest()        [128] Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [65] Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const
 [23] CylinderTest()        [163] Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [66] Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const
 [24] SpheroidTest()        [164] Imager::Dodecahedron::~Dodecahedron() [48] Imager::SolidObject::Contains(Imager::Vector const&) const
 [27] PolyhedraTest()       [116] Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [115] Imager::Dodecahedron::CheckEdge(int, int, double) const
 [16] ChessBoardTest()      [157] Imager::TriangleMesh::RotateX(double) [70] Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const
 [31] lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [158] Imager::TriangleMesh::RotateY(double) [105] Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const
 [67] lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [176] Imager::ConcreteBlock::~ConcreteBlock() [4] Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 [43] lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [165] Imager::SetComplement::RotateX(double) [35] Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 [21] SetDifferenceTest()   [159] Imager::SetComplement::RotateY(double) [57] Imager::SetComplement::Contains(Imager::Vector const&) const
[136] lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [89] Imager::SetComplement::Translate(double, double, double) [30] Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
[125] lodepng_info_init(LodePNGInfo*) [160] Imager::SetComplement::~SetComplement() [15] Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 [25] MultipleSphereTest()  [166] Imager::SetDifference::~SetDifference() [53] Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const
[137] lodepng_state_init(LodePNGState*) [167] Imager::SetIntersection::~SetIntersection() [11] Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 [19] SetIntersectionTest()  [47] Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [75] Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const
[112] lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char) [133] Imager::SolidObject_Reorientable::RotateX(double) [52] Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const
[121] lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [144] Imager::SolidObject_Reorientable::RotateY(double) [6] Imager::Scene::CalculateMatte(Imager::Intersection const&) const
[122] lodepng_info_cleanup(LodePNGInfo*) [69] Imager::SolidObject_Reorientable::RotateZ(double) [5] Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const
 [32] lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [134] Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) [58] Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const
[138] lodepng_state_cleanup(LodePNGState*) [130] Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) [40] Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const
 [38] lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [88] Imager::SolidObject_BinaryOperator::RotateX(double) [7] Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const
[139] lodepng_can_have_alpha(LodePNGColorMode const*) [86] Imager::SolidObject_BinaryOperator::RotateY(double) [54] Imager::Scene::PolarizedReflection(double, double, double, double) const
 [28] DodecahedronOverlapTest() [82] Imager::SolidObject_BinaryOperator::RotateZ(double) [10] Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const
[140] lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*) [84] Imager::SolidObject_BinaryOperator::Translate(double, double, double) [9] Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const
[118] lodepng_chunk_generate_crc(unsigned char*) [145] Imager::Scene::ClearSolidObjectList() [1] Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const
 [79] lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [146] Imager::Scene::~Scene() [49] Imager::Torus::SurfaceNormal(Imager::Vector const&) const
 [17] BlockTest()           [153] Imager::Torus::~Torus() [41] Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const
 [12] TorusTest(char const*, double) [155] Imager::Cuboid::~Cuboid() [77] Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const
 [36] encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) (lodepng.cpp) [152] Imager::Optics::SetOpacity(double) [37] Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 [78] sort_coins(Coin*, unsigned long) (lodepng.cpp) [97] Imager::Optics::SetGlossColor(Imager::Color const&) [42] Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const
 [64] string_set(char**, char const*) (lodepng.cpp) [72] Imager::Optics::SetMatteColor(Imager::Color const&) [13] Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
[109] cleanup_coins(Coin*, unsigned long) (lodepng.cpp) [90] Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [71] Imager::Optics::ValidateReflectionColor(Imager::Color const&) const
[141] zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) (lodepng.cpp) [177] Imager::Planet::~Planet() [14] Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
 [46] addBitToStream(unsigned long*, ucvector*, unsigned char) (lodepng.cpp) [92] Imager::Saturn::CreateRingSystem() [55] Imager::Sphere::Contains(Imager::Vector const&) const
 [76] color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char) (lodepng.cpp) [178] Imager::Saturn::~Saturn() [63] Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
[142] update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61] (lodepng.cpp) [161] Imager::Sphere::RotateX(double) [60] Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
[108] searchCodeIndex(unsigned int const*, unsigned long, unsigned long) (lodepng.cpp) [154] Imager::Sphere::RotateY(double) [29] Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
[119] Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62] (lodepng.cpp) [150] Imager::Sphere::~Sphere() [68] Imager::SetUnion::Contains(Imager::Vector const&) const
[120] addUnknownChunks(ucvector*, unsigned char*, unsigned long) (lodepng.cpp) [179] Imager::Cylinder::~Cylinder() [50] Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
[111] uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] (lodepng.cpp) [156] Imager::SetUnion::~SetUnion() [51] Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const
[126] checkColorValidity(LodePNGColorType, unsigned int) (lodepng.cpp) [180] Imager::Spheroid::~Spheroid() [127] std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&)
 [74] getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) (lodepng.cpp) [162] Imager::ThinRing::~ThinRing() [117] std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&)
 [39] ucvector_push_back(ucvector*, unsigned char) (lodepng.cpp) [151] Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [131] std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&)
[104] uivector_push_back(uivector*, unsigned int) (lodepng.cpp) [59] Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [135] std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&)
[113] HuffmanTree_cleanup(HuffmanTree*) (lodepng.cpp) [94] Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [132] std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&)
[123] LodePNGText_cleanup(LodePNGInfo*) (lodepng.cpp) [129] Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [149] void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag)
[110] append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) (lodepng.cpp) [95] Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [61] frame_dummy
 [73] lodepng_add32bitInt(ucvector*, unsigned int) (lodepng.cpp) [8] Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&) [83] <cycle 1>
[143] preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*) (lodepng.cpp) [44] Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [87] <cycle 2>
[124] LodePNGIText_cleanup(LodePNGInfo*) (lodepng.cpp) [93] Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [85] <cycle 3>
 [45] addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) (lodepng.cpp) [56] Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [2] <cycle 4>
[107] lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) (lodepng.cpp) [96] Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>)
[114] HuffmanTree_makeFromLengths2(HuffmanTree*) (lodepng.cpp) [62] Algebra::cbrt(std::complex<double>, int)

Most of the time (99.3%) is spent executing the SaveImage function (Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const). In the additional lodepng code that runs alongside the ray tracer, 94.4% of time is spent in the CalculateLighting function (Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const).


Assignment 2

During assignment 2, we tried a simple kernel that took the shape of a dot product, what this achieved was nothing special, actually as predicted at the end of assignment 1, continuously calling cudaMalloc and cudaMemCpy had severe consequences on time.

Initial implementation

 //version 1 dot product
__global__ void kdot(const float* d_a, const float* d_b, float* d_p, int ni, int nj, int nk) {
 int i = blockIdx.x * blockDim.x + threadIdx.x;
 int j = blockIdx.y * blockDim.y + threadIdx.y;
  //matrix multiplication
    if (i < ni && j < nj) {
       float sum = 0.0f;
       for (int k = 0; k < nk; k++)
          sum += d_a[i * nk + k] * d_b[k * nj + j];
         d_p[i * nj + j] = sum;
  }
}

Naive

Naturally this is a naive implementation as we are calling cudaMalloc for each iteration of the training for loop.

cout << "Training the model ...\n";
for (unsigned i = 0; i < 10000; ++i) {

This actually costs us an additional 20 minutes when profiling could be done.

The next steps

Well firstly we had to engage in research as to understand how the actual neural network was learning; for example why they used relu() function, how back-propagation worked and so much more. Some additional sites will be included.

After that and many coffees!
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1) {
int BATCH_SIZE = 256;
float lr = .01 / BATCH_SIZE;
kdot<<< 50,51>>>(ktranspose(d_a2, BATCH_SIZE, 64), d_dyhat, 64, BATCH_SIZE, 10, d_dW3);
kdot << <80,32>> >(d_dyhat, ktranspose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
kreluPrime(d_a2, 128 * 64);
for (int i = 0; i < BATCH_SIZE * 10; i++) {
   d_dz2[i] = d_dz2[i] * d_a2[i];
}
kdot << <1024, 32>> >(ktranspose(d_a1, BATCH_SIZE, 128), d_dz2, 128, BATCH_SIZE, 64, d_dW2);
kdot << <512,32>> >(d_dz2, ktranspose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1);
kreluPrime(d_a1, BATCH_SIZE * 784);
for (int i = 0; i < 256 * 64; i++) {
   d_dz1[i] = d_dz1[i] * d_a1[i];
}
kdot <<<512,512,32 >>>(ktranspose(d_b_X, BATCH_SIZE, 784), d_dz1, 784, BATCH_SIZE, 128, d_dW1);
// Updating the parameters
//W3 = W3 - lr * dW3;
for (int i = 0; i < (64*10); i++) {
d_W3[i] = d_W3[i] - lr * d_dW3[i];
}
//W2 = W2 - lr * dW2;
for (int i = 0; i < (128*64); i++) {
   d_W2[i] = d_W2[i] - lr * d_dW2[i];
}
//W1 = W1 - lr * dW1;
for (int i = 0; i < (784*128); i++) {
   d_W1[i] = d_W1[i] - lr * d_dW1[i];
 }

}

Dynamic Parallelism

Dynamic Parallelism in CUDA allows for the support of kernels to create and synchronize new nested kernels. Additionally, for our use case it also allows us to spend more time on the device to process information quickly without constant cudaMemcpy() or cudaMalloc() calls.
Parent call Child kernel( ... )
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) {
	int BATCH_SIZE = 256;
	float lr = 0.01 / BATCH_SIZE;
		//backpropagation
		d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10);
		kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
		cudaDeviceSynchronize();
}

__global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
	int i = blockIdx.x * blockDim.x + threadIdx.x;
	int j = blockIdx.y * blockDim.y + threadIdx.y;
	//matrix multiplication
	if (i < ni && j < nj) {
		float sum = 0.0f;
		for (int k = 0; k < nk; k++)
			sum += d_a[i * nk + k] * d_b[k * nj + j];
		d_p[i * nj + j] = sum;
	}
}

Final Iteration

GPU code
__device__ float* k_difference(const float* m1, const float* m2, const int size) {
	/* Returns the difference between the two vectors. */
	float* difference = new float[size];
	for (int i = 0; i < size; i++) {
		difference[i] = m1[i] - m2[i];
	}
	return difference;
}
__device__ float* k_MFV(const float f, const float* m, const int size) {
	float* mult = new float[size];
	for (int i = 0; i < size; i++) {
		mult[i] = f * m[i];
	}
	return mult;
}
__device__ float* k_MM(float* m1, float* m2, const int m2_size) {
	float* product = new float[m2_size];

	for (int i = 0; i != m2_size; ++i) {
		product[i] = m1[i] * m2[i];
	};

	return product;
}
__device__ float* k_transpose(float *m, const int C, const int R) {

	/*  Returns a transpose matrix of input matrix.
	Inputs:
	m: vector, input matrix
	C: int, number of columns in the input matrix
	R: int, number of rows in the input matrix
	Output: vector, transpose matrix mT of input matrix m
	*/

	float* mT = new float[C * R];
	for (unsigned n = 0; n != C * R; n++) {
		unsigned i = n / C;
		unsigned j = n % C;
		mT[n] = m[R*j + i];
	}

	return mT;

	//for (int i = 0; i<R; ++i)
	//	for (int j = 0; j<C; ++j)
	//	{
	//		mT[j * C + i] = m[i * R + j];
	//	}

	//return mT;
}
__device__ void dkernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
	for (int row = 0; row != ni; ++row) {
		for (int col = 0; col != nk; ++col) {
			d_p[row * nk + col] = 0.f;
			for (int k = 0; k != nj; ++k) {
				d_p[row * nk + col] += d_a[row * nj + k] * d_b[k * nk + col];
			}
		}
	}
}
//version 1 dot product
__global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
	int i = blockIdx.x * blockDim.x + threadIdx.x;
	int j = blockIdx.y * blockDim.y + threadIdx.y;
	//matrix multiplication
	if (i < ni && j < nj) {
		float sum = 0.0f;
		for (int k = 0; k < nk; k++)
			sum += d_a[i * nk + k] * d_b[k * nj + j];
		d_p[i * nj + j] = sum;
	}
}
void cudaCheck(cudaError_t Error) {
	if (Error != cudaSuccess) {
		cerr << cudaGetErrorName(Error) << "!";
		exit(EXIT_FAILURE);
	}
}



__device__ float* k_relu(float* a, int n) {
	for (int i = 0; i < n; ++i) {
		if (a[i] < 0) {
			a[i] = 0.01f;
		}
		else a[i] = a[i];
	}
	return a;
}
__device__ float* k_reluPrime(float* a, int n) {
	for (int i = 0; i < n; ++i) {
		if (a[i] > 0) {
			a[i] = 1.0f;
		}
		else a[i] = 0.0;
	}
	return a;
}
///activation functions __global__ 
__global__ void kernel_relu(float* a, int n) {
	int i = blockIdx.x * blockDim.x + threadIdx.x;
	if(i < n) {
		if (a[i] < 0) {
			a[i] = 0.01f;
		}
		else a[i] = a[i];
	}
}
__global__ void kernel_reluPrime(float* a, int n) {
	int i = blockIdx.x * blockDim.x + threadIdx.x;
	if (i < n) {
		if (a[i] > 0) {
			a[i] = 1.0f;
		}
		else a[i] = 0.0;
	}
}



__device__ void ksoftmax(float *input, int input_len) {
		//assert(input != NULL);
		//assert(input_len != 0);
		int i;
		float m;
		/* Find maximum value from input array */
		m = input[0];
		for (i = 1; i < input_len; i++) {
			if (input[i] > m) {
				m = input[i];
			}
		}

		float sum = 0;
		for (i = 0; i < input_len; i++) {
			sum += expf(input[i] - m);
		}

		for (i = 0; i < input_len; i++) {
			input[i] = expf(input[i] - m - log(sum));

		}
	}

__device__ void k_sigmoid(float* m1, int size) {

	/*  Returns the value of the sigmoid function f(x) = 1/(1 + e^-x).
	Input: m1, a vector.
	Output: 1/(1 + e^-x) for every element of the input matrix m1.
	*/
	for (unsigned i = 0; i != size; ++i) {
		m1[i] = 1 / (1 + exp(-m1[i]));
	}
}
__global__ void feed_forward(float* d_b_X, float* d_W1, float* d_W2, float* d_W3, float* d_b_Y, float* d_a1, float* d_a2, float* d_yhat, float* d_dyhat) {
	int BATCH_SIZE = 256;
	float lr = 0.01 / BATCH_SIZE;
	float* tempY = new float[256 * 64];
	//feed forward
	kernel_dot <<<256, 256>>> (d_b_X, d_W1, BATCH_SIZE, 784, 128, d_a1);
	cudaDeviceSynchronize();
	k_relu(d_a1, BATCH_SIZE * 784);
	kernel_dot <<<256, 128>>> (d_a1, d_W2, BATCH_SIZE, 128, 64, d_a2);
	cudaDeviceSynchronize();
	k_relu(d_a2, BATCH_SIZE * 128);
	kernel_dot <<<256, 64>>> (d_a2, d_W3, BATCH_SIZE, 64, 10, d_yhat);
	cudaDeviceSynchronize();
	ksoftmax(tempY, 10 * 10);
	for (int i = 0; i < 100; i++) {
		d_yhat[i] = tempY[i];
	}
	delete[] tempY;
}


__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) {
	cudaError_t Error;
	int BATCH_SIZE = 256;
	float lr = 0.01 / BATCH_SIZE;
		//backpropagation
		d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10);
		kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
		cudaDeviceSynchronize();
		float* mT = new float[256 * 64 - 1];
		for (int i = 0; i < 256; ++i)
			for (int j = 0; j < 64; ++j)
			{
				mT[j * 64 + i] = d_a2[i * 256 + j];
			}
		kernel_dot <<<(16384 + 256)/64, 64>>> (mT, d_dyhat, 64, BATCH_SIZE, 10, d_dW3);
		cudaDeviceSynchronize();
		k_reluPrime(d_a2, 256 * 64);
		for (int i = 0; i < BATCH_SIZE * 10; i++) {
			d_dz2[i] = d_dz2[i] * d_a2[i];
		}
		mT = new float[256 * 128];
		for (int i = 0; i < 256; ++i)
			for (int j = 0; j < 128; ++j)
			{
				mT[j * 128 + i] = d_a1[i * 256 + j];
			}
		kernel_dot <<<64, 512>>> (mT, d_dz2, 128, BATCH_SIZE, 64, d_dW2);
		cudaDeviceSynchronize();
		kernel_dot <<<80, 32>>> (d_dz2, k_transpose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1);
		cudaDeviceSynchronize();
		k_reluPrime(d_a1, BATCH_SIZE * 784);
		for (int i = 0; i < 256 * 64; i++) {
			d_dz1[i] = d_dz1[i] * d_a1[i];
		}
		kernel_dot <<<784, 256>>> (d_t, d_dz1, 784, BATCH_SIZE, 128, d_dW1);
		cudaDeviceSynchronize();
		//// Updating the parameters
		////W3 = W3 - lr * dW3;
		d_W3 = k_difference(d_W3, k_MFV(lr, d_dW3, 64 * 10), 64 * 10);
		//W2 = W2 - lr * dW2;
		d_W2 = k_difference(d_W2, k_MFV(lr, d_dW2, 128 * 64), 128 * 64);
		////W1 = W1 - lr * dW1;
		d_W1 = k_difference(d_W1, k_MFV(lr, d_dW1, 784 * 128), 784 * 128);
		for (int i = 0; i < (784 * 128); i++) {
			d_W1[i] = d_W1[i] - lr * d_dW1[i];
		}
		//for (int i = 0; i != 10; ++i) {
		//	for (int j = 0; j != 10; ++j) {
		//		printf("%f ", d_W3[i * 10 + j]);
		//	}
		//	printf("\n");
		//}
		//printf("\n");
		//for (int i = 0; i != 10; ++i) {
		//	for (int j = 0; j != 10; ++j) {
		//		printf("%f ", d_yhat[i * 10 + j]);
		//	}
		//	printf("\n");
		//}
		//printf("\n");
		float* dif;
		dif = k_difference(d_b_Y, d_yhat, 10 * 10);
		float loss = 0.0;
		for (unsigned k = 0; k < BATCH_SIZE * 10; ++k) {
			loss += dif[k] * dif[k];
		}
		printf("%f \n", loss / BATCH_SIZE);
		
		Error = cudaGetLastError();
		if (Error != cudaSuccess) {
			printf("\n %s \n", Error);
		}
};

Final Profile

This final profile is only of 20 iterations as we had errors occur beyond 20 iterations, likely due to naive coding and bad coding practice. Nnfinalprofile.jpg

Compiling

follow the article to set up visual studios for dynamic parallelism and recommended readings:

http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf 
http://ramblingsofagamedevstudent.blogspot.com/2014/03/set-up-visual-studio-2012-for-cuda.html

Assignment 3

What we would do differently:

There are many things, one of the major ones is to take on a more manageable task, one with proper documentation and reasoning behind chosen values.