Difference between revisions of "A-Team"
Spdjurovic (talk | contribs) (→Initial implementation) |
Spdjurovic (talk | contribs) (→Final Profile) |
||
(39 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
=== Assignment 1 === | === Assignment 1 === | ||
Our group decided to profile a couple of different solutions, the first being a simple neural network and ray tracing solution, in order to determine the best project to generate a solution for. | Our group decided to profile a couple of different solutions, the first being a simple neural network and ray tracing solution, in order to determine the best project to generate a solution for. | ||
− | + | ===Neural Network=== | |
======Sebastian's findings====== | ======Sebastian's findings====== | ||
I found a simple [https://gist.github.com/sbugrov/7f373f0e4788f8e076b8efa2abfd227a neural network] that takes a MNIST data set and preforms training on batches of the data. For a quick illustration MNIST is a numerical data set that contains many written numbers --in a gray scale format at 28 x 28 pixels in size. As well as the corresponding numerical values; between 0 and 9. The reason for this data set is to train networks such that they will be able to recognize written numbers when they confront them. | I found a simple [https://gist.github.com/sbugrov/7f373f0e4788f8e076b8efa2abfd227a neural network] that takes a MNIST data set and preforms training on batches of the data. For a quick illustration MNIST is a numerical data set that contains many written numbers --in a gray scale format at 28 x 28 pixels in size. As well as the corresponding numerical values; between 0 and 9. The reason for this data set is to train networks such that they will be able to recognize written numbers when they confront them. | ||
Line 96: | Line 96: | ||
Our Hypothesis for this solution is a acceleration of roughly 10x; when dot() is parallelized. This means that our code should take somewhere in the ball park of 102 seconds to train the network. | Our Hypothesis for this solution is a acceleration of roughly 10x; when dot() is parallelized. This means that our code should take somewhere in the ball park of 102 seconds to train the network. | ||
− | + | ===Ray Tracing=== | |
======Henry's findings====== | ======Henry's findings====== | ||
Line 102: | Line 102: | ||
======Initial Profile====== | ======Initial Profile====== | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Initial Profile (Warning: long) | ||
+ | |- | ||
+ | | Initial Profile | ||
Flat profile: | Flat profile: | ||
Line 276: | Line 281: | ||
0.00 19.10 0.00 1 0.00 0.00 Imager::Spheroid::~Spheroid() | 0.00 19.10 0.00 1 0.00 0.00 Imager::Spheroid::~Spheroid() | ||
0.00 19.10 0.00 1 0.00 0.00 Algebra::UnitTest() | 0.00 19.10 0.00 1 0.00 0.00 Algebra::UnitTest() | ||
+ | |} | ||
+ | |||
+ | ---- | ||
From looking at the flat profile, 43.88% of time is in SolveLinearEquations. Most of the other time is used for calculating the shapes, while 1.02% is in the TraceRay function. | From looking at the flat profile, 43.88% of time is in SolveLinearEquations. Most of the other time is used for calculating the shapes, while 1.02% is in the TraceRay function. | ||
Line 282: | Line 290: | ||
======Call Graph====== | ======Call Graph====== | ||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Call Graph | ||
+ | |- | ||
+ | | Call graph (explanation follows) | ||
Call graph | Call graph | ||
Line 1,428: | Line 1,440: | ||
Most of the time (99.3%) is spent executing the SaveImage function (Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const). In the additional lodepng code that runs alongside the ray tracer, 94.4% of time is spent in the CalculateLighting function (Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const). | Most of the time (99.3%) is spent executing the SaveImage function (Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const). In the additional lodepng code that runs alongside the ray tracer, 94.4% of time is spent in the CalculateLighting function (Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const). | ||
+ | |} | ||
+ | |||
+ | ---- | ||
=== Assignment 2 === | === Assignment 2 === | ||
During assignment 2, we tried a simple kernel that took the shape of a dot product, what this achieved was nothing special, actually as predicted at the end of assignment 1, continuously calling cudaMalloc and cudaMemCpy had severe consequences on time. | During assignment 2, we tried a simple kernel that took the shape of a dot product, what this achieved was nothing special, actually as predicted at the end of assignment 1, continuously calling cudaMalloc and cudaMemCpy had severe consequences on time. | ||
====Initial implementation==== | ====Initial implementation==== | ||
− | |||
//version 1 dot product | //version 1 dot product | ||
__global__ void kdot(const float* d_a, const float* d_b, float* d_p, int ni, int nj, int nk) { | __global__ void kdot(const float* d_a, const float* d_b, float* d_p, int ni, int nj, int nk) { | ||
Line 1,442: | Line 1,456: | ||
for (int k = 0; k < nk; k++) | for (int k = 0; k < nk; k++) | ||
sum += d_a[i * nk + k] * d_b[k * nj + j]; | sum += d_a[i * nk + k] * d_b[k * nj + j]; | ||
− | + | d_p[i * nj + j] = sum; | |
+ | } | ||
+ | } | ||
+ | |||
+ | ====Naive==== | ||
+ | Naturally this is a naive implementation as we are calling cudaMalloc for each iteration of the training for loop. | ||
+ | cout << "Training the model ...\n"; | ||
+ | for (unsigned i = 0; i < 10000; ++i) { | ||
+ | |||
+ | This actually costs us an additional 20 minutes when profiling could be done. | ||
+ | |||
+ | ====The next steps==== | ||
+ | Well firstly we had to engage in research as to understand how the actual neural network was learning; for example why they used relu() function, how back-propagation worked and so much more. | ||
+ | Some additional sites will be included. | ||
+ | |||
+ | =====After that and many coffees!===== | ||
+ | __global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1) { | ||
+ | int BATCH_SIZE = 256; | ||
+ | float lr = .01 / BATCH_SIZE; | ||
+ | kdot<<< 50,51>>>(ktranspose(d_a2, BATCH_SIZE, 64), d_dyhat, 64, BATCH_SIZE, 10, d_dW3); | ||
+ | kdot << <80,32>> >(d_dyhat, ktranspose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2); | ||
+ | kreluPrime(d_a2, 128 * 64); | ||
+ | for (int i = 0; i < BATCH_SIZE * 10; i++) { | ||
+ | d_dz2[i] = d_dz2[i] * d_a2[i]; | ||
+ | } | ||
+ | kdot << <1024, 32>> >(ktranspose(d_a1, BATCH_SIZE, 128), d_dz2, 128, BATCH_SIZE, 64, d_dW2); | ||
+ | kdot << <512,32>> >(d_dz2, ktranspose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1); | ||
+ | kreluPrime(d_a1, BATCH_SIZE * 784); | ||
+ | for (int i = 0; i < 256 * 64; i++) { | ||
+ | d_dz1[i] = d_dz1[i] * d_a1[i]; | ||
+ | } | ||
+ | kdot <<<512,512,32 >>>(ktranspose(d_b_X, BATCH_SIZE, 784), d_dz1, 784, BATCH_SIZE, 128, d_dW1); | ||
+ | // Updating the parameters | ||
+ | //W3 = W3 - lr * dW3; | ||
+ | for (int i = 0; i < (64*10); i++) { | ||
+ | d_W3[i] = d_W3[i] - lr * d_dW3[i]; | ||
+ | } | ||
+ | //W2 = W2 - lr * dW2; | ||
+ | for (int i = 0; i < (128*64); i++) { | ||
+ | d_W2[i] = d_W2[i] - lr * d_dW2[i]; | ||
+ | } | ||
+ | //W1 = W1 - lr * dW1; | ||
+ | for (int i = 0; i < (784*128); i++) { | ||
+ | d_W1[i] = d_W1[i] - lr * d_dW1[i]; | ||
+ | } | ||
+ | } | ||
+ | |||
+ | ===Dynamic Parallelism=== | ||
+ | Dynamic Parallelism in CUDA allows for the support of kernels to create and synchronize new nested kernels. Additionally, for our use case it also allows us to spend more time on the device to process information quickly without constant cudaMemcpy() or cudaMalloc() calls. | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Parent call Child kernel( ... ) | ||
+ | |- | ||
+ | | | ||
+ | <syntaxhighlight lang="cpp"> | ||
+ | __global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) { | ||
+ | int BATCH_SIZE = 256; | ||
+ | float lr = 0.01 / BATCH_SIZE; | ||
+ | //backpropagation | ||
+ | d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10); | ||
+ | kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2); | ||
+ | cudaDeviceSynchronize(); | ||
+ | } | ||
+ | |||
+ | __global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) { | ||
+ | int i = blockIdx.x * blockDim.x + threadIdx.x; | ||
+ | int j = blockIdx.y * blockDim.y + threadIdx.y; | ||
+ | //matrix multiplication | ||
+ | if (i < ni && j < nj) { | ||
+ | float sum = 0.0f; | ||
+ | for (int k = 0; k < nk; k++) | ||
+ | sum += d_a[i * nk + k] * d_b[k * nj + j]; | ||
+ | d_p[i * nj + j] = sum; | ||
+ | } | ||
+ | } | ||
+ | </syntaxhighlight> | ||
+ | |} | ||
+ | |||
+ | ===Final Iteration=== | ||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! GPU code | ||
+ | |- | ||
+ | | | ||
+ | <syntaxhighlight lang="cpp"> | ||
+ | __device__ float* k_difference(const float* m1, const float* m2, const int size) { | ||
+ | /* Returns the difference between the two vectors. */ | ||
+ | float* difference = new float[size]; | ||
+ | for (int i = 0; i < size; i++) { | ||
+ | difference[i] = m1[i] - m2[i]; | ||
} | } | ||
− | } | + | return difference; |
+ | } | ||
+ | __device__ float* k_MFV(const float f, const float* m, const int size) { | ||
+ | float* mult = new float[size]; | ||
+ | for (int i = 0; i < size; i++) { | ||
+ | mult[i] = f * m[i]; | ||
+ | } | ||
+ | return mult; | ||
+ | } | ||
+ | __device__ float* k_MM(float* m1, float* m2, const int m2_size) { | ||
+ | float* product = new float[m2_size]; | ||
+ | |||
+ | for (int i = 0; i != m2_size; ++i) { | ||
+ | product[i] = m1[i] * m2[i]; | ||
+ | }; | ||
+ | |||
+ | return product; | ||
+ | } | ||
+ | __device__ float* k_transpose(float *m, const int C, const int R) { | ||
+ | |||
+ | /* Returns a transpose matrix of input matrix. | ||
+ | Inputs: | ||
+ | m: vector, input matrix | ||
+ | C: int, number of columns in the input matrix | ||
+ | R: int, number of rows in the input matrix | ||
+ | Output: vector, transpose matrix mT of input matrix m | ||
+ | */ | ||
+ | |||
+ | float* mT = new float[C * R]; | ||
+ | for (unsigned n = 0; n != C * R; n++) { | ||
+ | unsigned i = n / C; | ||
+ | unsigned j = n % C; | ||
+ | mT[n] = m[R*j + i]; | ||
+ | } | ||
+ | |||
+ | return mT; | ||
+ | |||
+ | //for (int i = 0; i<R; ++i) | ||
+ | // for (int j = 0; j<C; ++j) | ||
+ | // { | ||
+ | // mT[j * C + i] = m[i * R + j]; | ||
+ | // } | ||
+ | |||
+ | //return mT; | ||
+ | } | ||
+ | __device__ void dkernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) { | ||
+ | for (int row = 0; row != ni; ++row) { | ||
+ | for (int col = 0; col != nk; ++col) { | ||
+ | d_p[row * nk + col] = 0.f; | ||
+ | for (int k = 0; k != nj; ++k) { | ||
+ | d_p[row * nk + col] += d_a[row * nj + k] * d_b[k * nk + col]; | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | //version 1 dot product | ||
+ | __global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) { | ||
+ | int i = blockIdx.x * blockDim.x + threadIdx.x; | ||
+ | int j = blockIdx.y * blockDim.y + threadIdx.y; | ||
+ | //matrix multiplication | ||
+ | if (i < ni && j < nj) { | ||
+ | float sum = 0.0f; | ||
+ | for (int k = 0; k < nk; k++) | ||
+ | sum += d_a[i * nk + k] * d_b[k * nj + j]; | ||
+ | d_p[i * nj + j] = sum; | ||
+ | } | ||
+ | } | ||
+ | void cudaCheck(cudaError_t Error) { | ||
+ | if (Error != cudaSuccess) { | ||
+ | cerr << cudaGetErrorName(Error) << "!"; | ||
+ | exit(EXIT_FAILURE); | ||
+ | } | ||
+ | } | ||
+ | |||
+ | |||
+ | |||
+ | __device__ float* k_relu(float* a, int n) { | ||
+ | for (int i = 0; i < n; ++i) { | ||
+ | if (a[i] < 0) { | ||
+ | a[i] = 0.01f; | ||
+ | } | ||
+ | else a[i] = a[i]; | ||
+ | } | ||
+ | return a; | ||
+ | } | ||
+ | __device__ float* k_reluPrime(float* a, int n) { | ||
+ | for (int i = 0; i < n; ++i) { | ||
+ | if (a[i] > 0) { | ||
+ | a[i] = 1.0f; | ||
+ | } | ||
+ | else a[i] = 0.0; | ||
+ | } | ||
+ | return a; | ||
+ | } | ||
+ | ///activation functions __global__ | ||
+ | __global__ void kernel_relu(float* a, int n) { | ||
+ | int i = blockIdx.x * blockDim.x + threadIdx.x; | ||
+ | if(i < n) { | ||
+ | if (a[i] < 0) { | ||
+ | a[i] = 0.01f; | ||
+ | } | ||
+ | else a[i] = a[i]; | ||
+ | } | ||
+ | } | ||
+ | __global__ void kernel_reluPrime(float* a, int n) { | ||
+ | int i = blockIdx.x * blockDim.x + threadIdx.x; | ||
+ | if (i < n) { | ||
+ | if (a[i] > 0) { | ||
+ | a[i] = 1.0f; | ||
+ | } | ||
+ | else a[i] = 0.0; | ||
+ | } | ||
+ | } | ||
+ | |||
+ | |||
+ | |||
+ | __device__ void ksoftmax(float *input, int input_len) { | ||
+ | //assert(input != NULL); | ||
+ | //assert(input_len != 0); | ||
+ | int i; | ||
+ | float m; | ||
+ | /* Find maximum value from input array */ | ||
+ | m = input[0]; | ||
+ | for (i = 1; i < input_len; i++) { | ||
+ | if (input[i] > m) { | ||
+ | m = input[i]; | ||
+ | } | ||
+ | } | ||
+ | |||
+ | float sum = 0; | ||
+ | for (i = 0; i < input_len; i++) { | ||
+ | sum += expf(input[i] - m); | ||
+ | } | ||
+ | |||
+ | for (i = 0; i < input_len; i++) { | ||
+ | input[i] = expf(input[i] - m - log(sum)); | ||
+ | |||
+ | } | ||
+ | } | ||
+ | |||
+ | __device__ void k_sigmoid(float* m1, int size) { | ||
+ | |||
+ | /* Returns the value of the sigmoid function f(x) = 1/(1 + e^-x). | ||
+ | Input: m1, a vector. | ||
+ | Output: 1/(1 + e^-x) for every element of the input matrix m1. | ||
+ | */ | ||
+ | for (unsigned i = 0; i != size; ++i) { | ||
+ | m1[i] = 1 / (1 + exp(-m1[i])); | ||
+ | } | ||
+ | } | ||
+ | __global__ void feed_forward(float* d_b_X, float* d_W1, float* d_W2, float* d_W3, float* d_b_Y, float* d_a1, float* d_a2, float* d_yhat, float* d_dyhat) { | ||
+ | int BATCH_SIZE = 256; | ||
+ | float lr = 0.01 / BATCH_SIZE; | ||
+ | float* tempY = new float[256 * 64]; | ||
+ | //feed forward | ||
+ | kernel_dot <<<256, 256>>> (d_b_X, d_W1, BATCH_SIZE, 784, 128, d_a1); | ||
+ | cudaDeviceSynchronize(); | ||
+ | k_relu(d_a1, BATCH_SIZE * 784); | ||
+ | kernel_dot <<<256, 128>>> (d_a1, d_W2, BATCH_SIZE, 128, 64, d_a2); | ||
+ | cudaDeviceSynchronize(); | ||
+ | k_relu(d_a2, BATCH_SIZE * 128); | ||
+ | kernel_dot <<<256, 64>>> (d_a2, d_W3, BATCH_SIZE, 64, 10, d_yhat); | ||
+ | cudaDeviceSynchronize(); | ||
+ | ksoftmax(tempY, 10 * 10); | ||
+ | for (int i = 0; i < 100; i++) { | ||
+ | d_yhat[i] = tempY[i]; | ||
+ | } | ||
+ | delete[] tempY; | ||
+ | } | ||
+ | |||
+ | |||
+ | __global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) { | ||
+ | cudaError_t Error; | ||
+ | int BATCH_SIZE = 256; | ||
+ | float lr = 0.01 / BATCH_SIZE; | ||
+ | //backpropagation | ||
+ | d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10); | ||
+ | kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2); | ||
+ | cudaDeviceSynchronize(); | ||
+ | float* mT = new float[256 * 64 - 1]; | ||
+ | for (int i = 0; i < 256; ++i) | ||
+ | for (int j = 0; j < 64; ++j) | ||
+ | { | ||
+ | mT[j * 64 + i] = d_a2[i * 256 + j]; | ||
+ | } | ||
+ | kernel_dot <<<(16384 + 256)/64, 64>>> (mT, d_dyhat, 64, BATCH_SIZE, 10, d_dW3); | ||
+ | cudaDeviceSynchronize(); | ||
+ | k_reluPrime(d_a2, 256 * 64); | ||
+ | for (int i = 0; i < BATCH_SIZE * 10; i++) { | ||
+ | d_dz2[i] = d_dz2[i] * d_a2[i]; | ||
+ | } | ||
+ | mT = new float[256 * 128]; | ||
+ | for (int i = 0; i < 256; ++i) | ||
+ | for (int j = 0; j < 128; ++j) | ||
+ | { | ||
+ | mT[j * 128 + i] = d_a1[i * 256 + j]; | ||
+ | } | ||
+ | kernel_dot <<<64, 512>>> (mT, d_dz2, 128, BATCH_SIZE, 64, d_dW2); | ||
+ | cudaDeviceSynchronize(); | ||
+ | kernel_dot <<<80, 32>>> (d_dz2, k_transpose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1); | ||
+ | cudaDeviceSynchronize(); | ||
+ | k_reluPrime(d_a1, BATCH_SIZE * 784); | ||
+ | for (int i = 0; i < 256 * 64; i++) { | ||
+ | d_dz1[i] = d_dz1[i] * d_a1[i]; | ||
+ | } | ||
+ | kernel_dot <<<784, 256>>> (d_t, d_dz1, 784, BATCH_SIZE, 128, d_dW1); | ||
+ | cudaDeviceSynchronize(); | ||
+ | //// Updating the parameters | ||
+ | ////W3 = W3 - lr * dW3; | ||
+ | d_W3 = k_difference(d_W3, k_MFV(lr, d_dW3, 64 * 10), 64 * 10); | ||
+ | //W2 = W2 - lr * dW2; | ||
+ | d_W2 = k_difference(d_W2, k_MFV(lr, d_dW2, 128 * 64), 128 * 64); | ||
+ | ////W1 = W1 - lr * dW1; | ||
+ | d_W1 = k_difference(d_W1, k_MFV(lr, d_dW1, 784 * 128), 784 * 128); | ||
+ | for (int i = 0; i < (784 * 128); i++) { | ||
+ | d_W1[i] = d_W1[i] - lr * d_dW1[i]; | ||
+ | } | ||
+ | //for (int i = 0; i != 10; ++i) { | ||
+ | // for (int j = 0; j != 10; ++j) { | ||
+ | // printf("%f ", d_W3[i * 10 + j]); | ||
+ | // } | ||
+ | // printf("\n"); | ||
+ | //} | ||
+ | //printf("\n"); | ||
+ | //for (int i = 0; i != 10; ++i) { | ||
+ | // for (int j = 0; j != 10; ++j) { | ||
+ | // printf("%f ", d_yhat[i * 10 + j]); | ||
+ | // } | ||
+ | // printf("\n"); | ||
+ | //} | ||
+ | //printf("\n"); | ||
+ | float* dif; | ||
+ | dif = k_difference(d_b_Y, d_yhat, 10 * 10); | ||
+ | float loss = 0.0; | ||
+ | for (unsigned k = 0; k < BATCH_SIZE * 10; ++k) { | ||
+ | loss += dif[k] * dif[k]; | ||
+ | } | ||
+ | printf("%f \n", loss / BATCH_SIZE); | ||
+ | |||
+ | Error = cudaGetLastError(); | ||
+ | if (Error != cudaSuccess) { | ||
+ | printf("\n %s \n", Error); | ||
+ | } | ||
+ | }; | ||
+ | </syntaxhighlight> | ||
+ | |} | ||
+ | ===Final Profile=== | ||
+ | This final profile is only of 20 iterations as we had errors occur beyond 20 iterations, likely due to naive coding and bad coding practice. | ||
+ | [[File:nnfinalprofile.jpg]] | ||
+ | |||
+ | ===Compiling=== | ||
+ | follow the article to set up visual studios for dynamic parallelism and recommended readings: | ||
+ | |||
+ | http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf | ||
+ | |||
+ | http://ramblingsofagamedevstudent.blogspot.com/2014/03/set-up-visual-studio-2012-for-cuda.html | ||
=== Assignment 3 === | === Assignment 3 === | ||
+ | ====What we would do differently:==== | ||
+ | There are many things, one of the major ones is to take on a more manageable task, one with proper documentation and reasoning behind chosen values. |
Latest revision as of 23:50, 7 April 2019
Contents
Back Propagation Acceleration
Team Members
- Sebastian Djurovic, Team Lead and Developer
- Henry Leung, Developer and Quality Control
- ...
Progress
Assignment 1
Our group decided to profile a couple of different solutions, the first being a simple neural network and ray tracing solution, in order to determine the best project to generate a solution for.
Neural Network
Sebastian's findings
I found a simple neural network that takes a MNIST data set and preforms training on batches of the data. For a quick illustration MNIST is a numerical data set that contains many written numbers --in a gray scale format at 28 x 28 pixels in size. As well as the corresponding numerical values; between 0 and 9. The reason for this data set is to train networks such that they will be able to recognize written numbers when they confront them.
Initial Profile
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 97.94 982.46 982.46 dot(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&, int, int, int) 1.45 997.05 14.58 transpose(float*, int, int) 0.15 998.56 1.51 operator-(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&) 0.15 1000.06 1.50 relu(std::vector<float, std::allocator<float> > const&) 0.15 1001.55 1.49 operator*(float, std::vector<float, std::allocator<float> > const&) 0.07 1002.27 0.72 519195026 1.39 1.39 void std::vector<float, std::allocator<float> >::emplace_back<float>(float&&) 0.06 1002.91 0.63 operator*(std::vector<float, std::allocator<float> > const&, std::vector<float, std::allocator<float> > const&) 0.05 1003.37 0.46 reluPrime(std::vector<float, std::allocator<float> > const&) 0.02 1003.62 0.25 softmax(std::vector<float, std::allocator<float> > const&, int) 0.01 1003.75 0.13 operator/(std::vector<float, std::allocator<float> > const&, float) 0.01 1003.87 0.12 442679 271.35 271.35 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float>(float&&) 0.01 1003.96 0.09 13107321 6.87 6.87 void std::vector<float, std::allocator<float> >::_M_emplace_back_aux<float const&>(float const&) 0.01 1004.02 0.06 split(std::string const&, char) 0.01 1004.08 0.06 462000 130.00 130.00 void std::vector<std::string, std::allocator<std::string> >::_M_emplace_back_aux<std::string const&>(std::string const&) 0.00 1004.11 0.03 std::vector<std::string, std::allocator<std::string> >::~vector() 0.00 1004.12 0.01 random_vector(int) 0.00 1004.12 0.00 3 0.00 0.00 std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&) 0.00 1004.12 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z5printRKSt6vectorIfSaIfEEii
After the initial profile it is obvious that the dot product function consumes 97.94% of our run time. Additionally, the transpose function also consumes 1.45% which seems messily, however during back propagation transpose is also called, as well as two rectifiers(activation functions), reluPrime and relu. Where reluPrime is a binary activation function.
Relu = f(x) = {0 for x > 0, x otherwise}
ReluPrime = f(x) = {0 for x > 0, 1 otherwise}
Code Snippets
// Back propagation vector<float> dyhat = (yhat - b_y); // dW3 = a2.T * dyhat vector<float> dW3 = dot(transpose( &a2[0], BATCH_SIZE, 64 ), dyhat, 64, BATCH_SIZE, 10); // dz2 = dyhat * W3.T * relu'(a2) vector<float> dz2 = dot(dyhat, transpose( &W3[0], 64, 10 ), BATCH_SIZE, 10, 64) * reluPrime(a2); // dW2 = a1.T * dz2 vector<float> dW2 = dot(transpose( &a1[0], BATCH_SIZE, 128 ), dz2, 128, BATCH_SIZE, 64); // dz1 = dz2 * W2.T * relu'(a1) vector<float> dz1 = dot(dz2, transpose( &W2[0], 128, 64 ), BATCH_SIZE, 64, 128) * reluPrime(a1); // dW1 = X.T * dz1 vector<float> dW1 = dot(transpose( &b_X[0], BATCH_SIZE, 784 ), dz1, 784, BATCH_SIZE, 128);
vector <float> dot (const vector <float>& m1, const vector <float>& m2, const int m1_rows, const int m1_columns, const int m2_columns) { vector <float> output (m1_rows*m2_columns); for( int row = 0; row != m1_rows; ++row ) { for( int col = 0; col != m2_columns; ++col ) { output[ row * m2_columns + col ] = 0.f; for( int k = 0; k != m1_columns; ++k ) { output[ row * m2_columns + col ] += m1[ row * m1_columns + k ] * m2[ k * m2_columns + col ]; } } } return output; }
Amdahl's law
When Amdahl's law is applied the theoretical speed up is 48.54x, however due to the exception the actual prediction is no more then 10x faster.
Theoretical:
s = 1/(1 - 97.94%) = 1/(1 - 0.9794) = 48.54
Prediction:
P = 102s
Possible complications
The main concern when parallelizing these code snippets is that memory copying is going to take up a lot of time, so despite the predicted speed up, there is no certain answer until the Cuda kernel is complete.
Hypothesis
Our Hypothesis for this solution is a acceleration of roughly 10x; when dot() is parallelized. This means that our code should take somewhere in the ball park of 102 seconds to train the network.
Ray Tracing
Henry's findings
I decided to choose a ray tracing program that draws graphics such as a block, cuboid and cylinder. The shapes are rendered with shadows. The program is from http://cosinekitty.com/raytrace.
Initial Profile
Initial Profile (Warning: long) |
---|
Initial Profile
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 43.88 8.38 8.38 406030768 0.00 0.00 Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&) 13.98 11.05 2.67 14003920 0.00 0.00 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 8.12 12.60 1.55 34580399 0.00 0.00 Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 7.72 14.08 1.48 66701722 0.00 0.00 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 3.25 14.70 0.62 50859850 0.00 0.00 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 3.12 15.29 0.60 8534998 0.00 0.00 Imager::Scene::CalculateMatte(Imager::Intersection const&) const 2.64 15.80 0.51 594 0.00 0.00 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) 1.88 16.16 0.36 118023768 0.00 0.00 Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const 1.73 16.49 0.33 15 0.02 1.26 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const 1.31 16.74 0.25 3262804 0.00 0.00 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const 1.26 16.98 0.24 5171493 0.00 0.00 Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) 1.07 17.18 0.21 18609329 0.00 0.00 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const 1.02 17.38 0.20 18609329 0.00 0.00 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const 0.92 17.55 0.18 83146683 0.00 0.00 Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) 0.89 17.72 0.17 8573986 0.00 0.00 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const 0.86 17.89 0.17 22928551 0.00 0.00 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const 0.63 18.01 0.12 1966663 0.00 0.00 Imager::Torus::SurfaceNormal(Imager::Vector const&) const 0.58 18.12 0.11 7514037 0.00 0.00 Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.42 18.20 0.08 5171490 0.00 0.00 Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const 0.42 18.28 0.08 3115245 0.00 0.00 Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.34 18.34 0.07 Imager::Scene::PolarizedReflection(double, double, double, double) const 0.31 18.40 0.06 11368907 0.00 0.00 Imager::Sphere::Contains(Imager::Vector const&) const 0.31 18.46 0.06 6856730 0.00 0.00 Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) 0.26 18.51 0.05 9484218 0.00 0.00 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const 0.26 18.56 0.05 2906525 0.00 0.00 Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const 0.21 18.60 0.04 12028218 0.00 0.00 Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) 0.21 18.64 0.04 5171490 0.00 0.00 Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.21 18.68 0.04 3522280 0.00 0.00 Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const 0.16 18.71 0.03 5171292 0.00 0.00 Algebra::cbrt(std::complex<double>, int) 0.16 18.74 0.03 5088067 0.00 0.00 string_set(char**, char const*) 0.16 18.77 0.03 957358 0.00 0.00 Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.16 18.80 0.03 17010 0.00 0.00 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) 0.16 18.83 0.03 frame_dummy 0.13 18.86 0.03 3617416 0.00 0.00 Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const 0.10 18.88 0.02 13944693 0.00 0.00 Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.10 18.90 0.02 11893060 0.00 0.00 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.10 18.92 0.02 3538132 0.00 0.00 Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const 0.10 18.94 0.02 1425590 0.00 0.00 Imager::Optics::ValidateReflectionColor(Imager::Color const&) const 0.10 18.96 0.02 1 0.02 0.02 Imager::SolidObject_Reorientable::RotateZ(double) 0.08 18.97 0.02 170828 0.00 0.00 Imager::SolidObject::Contains(Imager::Vector const&) const 0.05 18.98 0.01 5946530 0.00 0.00 Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.05 18.99 0.01 5268088 0.00 0.00 Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.05 19.00 0.01 3682000 0.00 0.00 getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) 0.05 19.01 0.01 3170898 0.00 0.00 Imager::SetComplement::Contains(Imager::Vector const&) const 0.05 19.02 0.01 2953369 0.00 0.00 addBitToStream(unsigned long*, ucvector*, unsigned char) 0.05 19.03 0.01 2096776 0.00 0.00 Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const 0.05 19.04 0.01 1425504 0.00 0.00 Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const 0.05 19.05 0.01 627238 0.00 0.00 color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char) 0.05 19.06 0.01 77932 0.00 0.00 Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const 0.05 19.07 0.01 3016 0.00 0.00 sort_coins(Coin*, unsigned long) 0.05 19.08 0.01 73 0.00 0.00 Imager::SolidObject::Translate(double, double, double) 0.05 19.09 0.01 15 0.00 0.00 lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) 0.03 19.10 0.01 Imager::SolidObject_BinaryOperator::RotateZ(double) 0.00 19.10 0.00 2431840 0.00 0.00 Imager::SetUnion::Contains(Imager::Vector const&) const 0.00 19.10 0.00 1447152 0.00 0.00 uivector_push_back(uivector*, unsigned int) 0.00 19.10 0.00 1425505 0.00 0.00 Imager::Optics::SetMatteColor(Imager::Color const&) 0.00 19.10 0.00 1395302 0.00 0.00 Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const 0.00 19.10 0.00 1137648 0.00 0.00 Imager::ChessBoard::SquareCoordinate(double) const 0.00 19.10 0.00 738585 0.00 0.00 ucvector_push_back(ucvector*, unsigned char) 0.00 19.10 0.00 478679 0.00 0.00 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const 0.00 19.10 0.00 406979 0.00 0.00 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) 0.00 19.10 0.00 255938 0.00 0.00 searchCodeIndex(unsigned int const*, unsigned long, unsigned long) 0.00 19.10 0.00 3245 0.00 0.00 cleanup_coins(Coin*, unsigned long) 0.00 19.10 0.00 2787 0.00 0.00 append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) 0.00 19.10 0.00 729 0.00 0.00 uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] 0.00 19.10 0.00 607 0.00 0.00 lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char) 0.00 19.10 0.00 243 0.00 0.00 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) 0.00 19.10 0.00 243 0.00 0.00 HuffmanTree_cleanup(HuffmanTree*) 0.00 19.10 0.00 243 0.00 0.00 HuffmanTree_makeFromLengths2(HuffmanTree*) 0.00 19.10 0.00 243 0.00 0.00 HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) 0.00 19.10 0.00 120 0.00 0.00 Imager::Dodecahedron::CheckEdge(int, int, double) const 0.00 19.10 0.00 92 0.00 0.00 Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) 0.00 19.10 0.00 88 0.00 0.00 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) 0.00 19.10 0.00 48 0.00 0.00 lodepng_chunk_generate_crc(unsigned char*) 0.00 19.10 0.00 48 0.00 0.00 Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62] 0.00 19.10 0.00 48 0.00 0.00 addUnknownChunks(ucvector*, unsigned char*, unsigned long) 0.00 19.10 0.00 45 0.00 0.00 lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) 0.00 19.10 0.00 45 0.00 0.00 lodepng_info_cleanup(LodePNGInfo*) 0.00 19.10 0.00 45 0.00 0.00 LodePNGText_cleanup(LodePNGInfo*) 0.00 19.10 0.00 45 0.00 0.00 lodepng_add32bitInt(ucvector*, unsigned int) 0.00 19.10 0.00 45 0.00 0.00 LodePNGIText_cleanup(LodePNGInfo*) 0.00 19.10 0.00 30 0.00 0.00 lodepng_info_init(LodePNGInfo*) 0.00 19.10 0.00 30 0.00 0.00 checkColorValidity(LodePNGColorType, unsigned int) 0.00 19.10 0.00 29 0.00 0.00 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) 0.00 19.10 0.00 24 0.00 0.00 Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) 0.00 19.10 0.00 22 0.00 0.00 Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) 0.00 19.10 0.00 21 0.00 0.00 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) 0.00 19.10 0.00 20 0.00 0.00 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) 0.00 19.10 0.00 20 0.00 0.00 std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&) 0.00 19.10 0.00 20 0.00 0.00 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) 0.00 19.10 0.00 18 0.00 0.00 Imager::SolidObject_Reorientable::RotateX(double) 0.00 19.10 0.00 18 0.00 0.00 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) 0.00 19.10 0.00 17 0.00 0.00 std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) 0.00 19.10 0.00 15 0.00 0.04 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) 0.00 19.10 0.00 15 0.00 0.02 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) 0.00 19.10 0.00 15 0.00 0.00 lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) 0.00 19.10 0.00 15 0.00 0.00 lodepng_state_init(LodePNGState*) 0.00 19.10 0.00 15 0.00 0.04 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) 0.00 19.10 0.00 15 0.00 0.00 lodepng_state_cleanup(LodePNGState*) 0.00 19.10 0.00 15 0.00 0.03 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) 0.00 19.10 0.00 15 0.00 0.00 lodepng_can_have_alpha(LodePNGColorMode const*) 0.00 19.10 0.00 15 0.00 0.00 lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*) 0.00 19.10 0.00 15 0.00 0.00 zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) 0.00 19.10 0.00 15 0.00 0.00 update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61] 0.00 19.10 0.00 15 0.00 0.00 preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*) 0.00 19.10 0.00 15 0.00 0.00 Imager::SolidObject_Reorientable::RotateY(double) 0.00 19.10 0.00 15 0.00 0.00 Imager::Scene::ClearSolidObjectList() 0.00 19.10 0.00 15 0.00 0.00 Imager::Scene::~Scene() 0.00 19.10 0.00 15 0.00 0.04 lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) 0.00 19.10 0.00 15 0.00 0.00 lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int) 0.00 19.10 0.00 15 0.00 0.04 lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) 0.00 19.10 0.00 15 0.00 0.00 lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&) 0.00 19.10 0.00 15 0.00 0.00 void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag) 0.00 19.10 0.00 14 0.00 0.00 Imager::SolidObject_BinaryOperator::Translate(double, double, double) 0.00 19.10 0.00 13 0.00 0.00 Imager::Sphere::~Sphere() 0.00 19.10 0.00 10 0.00 0.00 Imager::SolidObject_BinaryOperator::RotateY(double) 0.00 19.10 0.00 9 0.00 0.00 Imager::SolidObject_BinaryOperator::RotateX(double) 0.00 19.10 0.00 8 0.00 0.00 Imager::SetComplement::Translate(double, double, double) 0.00 19.10 0.00 7 0.00 0.00 Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) 0.00 19.10 0.00 6 0.00 0.00 Imager::Optics::SetOpacity(double) 0.00 19.10 0.00 5 0.00 0.00 Imager::Torus::~Torus() 0.00 19.10 0.00 5 0.00 0.00 Imager::Sphere::RotateY(double) 0.00 19.10 0.00 4 0.00 0.00 Imager::Cuboid::~Cuboid() 0.00 19.10 0.00 4 0.00 0.00 Imager::SetUnion::~SetUnion() 0.00 19.10 0.00 3 0.00 0.00 Imager::TriangleMesh::RotateX(double) 0.00 19.10 0.00 3 0.00 0.00 Imager::TriangleMesh::RotateY(double) 0.00 19.10 0.00 3 0.00 0.00 Imager::SetComplement::RotateY(double) 0.00 19.10 0.00 3 0.00 0.00 Imager::SetComplement::~SetComplement() 0.00 19.10 0.00 3 0.00 0.00 Imager::Sphere::RotateX(double) 0.00 19.10 0.00 3 0.00 0.00 Imager::ThinRing::~ThinRing() 0.00 19.10 0.00 3 0.00 0.00 Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) 0.00 19.10 0.00 2 0.00 1.27 TorusTest(char const*, double) 0.00 19.10 0.00 2 0.00 0.00 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) 0.00 19.10 0.00 2 0.00 0.00 Imager::Dodecahedron::~Dodecahedron() 0.00 19.10 0.00 2 0.00 0.00 Imager::SetComplement::RotateX(double) 0.00 19.10 0.00 2 0.00 0.00 Imager::SetDifference::~SetDifference() 0.00 19.10 0.00 2 0.00 0.00 Imager::SetIntersection::~SetIntersection() 0.00 19.10 0.00 2 0.00 0.00 Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) 0.00 19.10 0.00 2 0.00 0.00 Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) 0.00 19.10 0.00 2 0.00 0.00 Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) 0.00 19.10 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z9BlockTestv 0.00 19.10 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN6Imager5Scene20ClearSolidObjectListEv 0.00 19.10 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN6Imager6IndentERSoi 0.00 19.10 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN7Algebra20SolveLinearEquationsEddddddddddddRdS0_S0_ 0.00 19.10 0.00 1 0.00 1.26 CuboidTest() 0.00 19.10 0.00 1 0.00 1.27 SaturnTest() 0.00 19.10 0.00 1 0.00 1.27 BitDonutTest() 0.00 19.10 0.00 1 0.00 1.26 CylinderTest() 0.00 19.10 0.00 1 0.00 1.26 SpheroidTest() 0.00 19.10 0.00 1 0.00 1.26 PolyhedraTest() 0.00 19.10 0.00 1 0.00 1.28 ChessBoardTest() 0.00 19.10 0.00 1 0.00 1.27 SetDifferenceTest() 0.00 19.10 0.00 1 0.00 1.26 MultipleSphereTest() 0.00 19.10 0.00 1 0.00 1.27 SetIntersectionTest() 0.00 19.10 0.00 1 0.00 1.26 DodecahedronOverlapTest() 0.00 19.10 0.00 1 0.00 1.27 BlockTest() 0.00 19.10 0.00 1 0.00 0.00 Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&) 0.00 19.10 0.00 1 0.00 0.00 Imager::ChessBoard::~ChessBoard() 0.00 19.10 0.00 1 0.00 0.00 Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) 0.00 19.10 0.00 1 0.00 0.00 Imager::Icosahedron::~Icosahedron() 0.00 19.10 0.00 1 0.00 0.00 Imager::ConcreteBlock::~ConcreteBlock() 0.00 19.10 0.00 1 0.00 0.00 Imager::Optics::SetGlossColor(Imager::Color const&) 0.00 19.10 0.00 1 0.00 0.00 Imager::Planet::~Planet() 0.00 19.10 0.00 1 0.00 0.00 Imager::Saturn::CreateRingSystem() 0.00 19.10 0.00 1 0.00 0.00 Imager::Saturn::~Saturn() 0.00 19.10 0.00 1 0.00 0.00 Imager::Cylinder::~Cylinder() 0.00 19.10 0.00 1 0.00 0.00 Imager::Spheroid::~Spheroid() 0.00 19.10 0.00 1 0.00 0.00 Algebra::UnitTest() |
From looking at the flat profile, 43.88% of time is in SolveLinearEquations. Most of the other time is used for calculating the shapes, while 1.02% is in the TraceRay function.
Call Graph
Call Graph |
---|
Call graph (explanation follows)
Call graph
index % time self children called name 0.02 1.24 1/15 SphereTest() [26] 0.02 1.24 1/15 CuboidTest() [22] 0.02 1.24 1/15 SetDifferenceTest() [21] 0.02 1.24 1/15 CylinderTest() [23] 0.02 1.24 1/15 SpheroidTest() [24] 0.02 1.24 1/15 SetIntersectionTest() [19] 0.02 1.24 1/15 MultipleSphereTest() [25] 0.02 1.24 1/15 PolyhedraTest() [27] 0.02 1.24 1/15 BitDonutTest() [20] 0.02 1.24 1/15 SaturnTest() [18] 0.02 1.24 1/15 DodecahedronOverlapTest() [28] 0.02 1.24 1/15 BlockTest() [17] 0.02 1.24 1/15 ChessBoardTest() [16] 0.04 2.48 2/15 TorusTest(char const*, double) [12] [1] 99.3 0.33 18.64 15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.67 17.36 12440000/12440000 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] 0.00 0.62 15/15 lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33] 0.00 0.00 15/15 lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int) [147] [2] 94.4 0.67 17.36 12440000+20912644 <cycle 4 as a whole> [2] 0.17 9.87 8573986 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] 0.20 7.37 18609329 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] 0.25 0.12 3262804 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] 0.05 0.00 2906525 Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58] <spontaneous> [3] 92.9 0.00 17.73 UnitTests() [3] 0.00 2.53 2/2 TorusTest(char const*, double) [12] 0.00 1.28 1/1 ChessBoardTest() [16] 0.00 1.27 1/1 BlockTest() [17] 0.00 1.27 1/1 SaturnTest() [18] 0.00 1.27 1/1 SetIntersectionTest() [19] 0.00 1.27 1/1 BitDonutTest() [20] 0.00 1.27 1/1 SetDifferenceTest() [21] 0.00 1.26 1/1 CylinderTest() [23] 0.00 1.26 1/1 CuboidTest() [22] 0.00 1.26 1/1 SpheroidTest() [24] 0.00 1.26 1/1 MultipleSphereTest() [25] 0.00 1.26 1/1 DodecahedronOverlapTest() [28] 0.00 1.26 1/1 PolyhedraTest() [27] 0.00 0.00 1/1 Algebra::UnitTest() [91] 0.03 0.10 170828/14003920 Imager::SolidObject::Contains(Imager::Vector const&) const [48] 0.04 0.12 205218/14003920 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 1.10 3.46 5760000/14003920 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] 1.50 4.72 7867874/14003920 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] [4] 58.0 2.67 8.40 14003920 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] 8.38 0.00 406030768/406030768 Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&) [8] 0.02 0.00 3538132/3538132 Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const [70] 0.00 0.00 8/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 8573986 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] [5] 52.6 0.17 9.87 8573986 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] 0.60 9.17 8534998/8534998 Imager::Scene::CalculateMatte(Imager::Intersection const&) const [6] 0.04 0.04 3522280/3522280 Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53] 0.03 0.00 3617416/3617416 Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const [66] 0.00 0.00 1395302/1395302 Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const [105] 3262804 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] 2906525 Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58] 0.60 9.17 8534998/8534998 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] [6] 51.1 0.60 9.17 8534998 Imager::Scene::CalculateMatte(Imager::Intersection const&) const [6] 0.17 9.00 22928551/22928551 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] 0.17 9.00 22928551/22928551 Imager::Scene::CalculateMatte(Imager::Intersection const&) const [6] [7] 48.0 0.17 9.00 22928551 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] 1.50 4.72 7867874/14003920 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] 0.19 0.83 15801057/50859850 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] 0.84 0.00 38169893/66701722 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14] 0.00 0.30 2698530/5946530 Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30] 0.00 0.30 2698530/11893060 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.00 0.16 2486753/13944693 Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29] 0.14 0.00 64537354/83146683 Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [47] 8.38 0.00 406030768/406030768 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] [8] 43.9 8.38 0.00 406030768 Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&) [8] 2906525 Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58] 3262804 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] 0.67 17.36 12440000/12440000 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] [9] 39.6 0.20 7.37 18609329 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] 0.21 7.13 18609329/18609329 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] 0.04 0.00 18609329/83146683 Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [47] 8573986 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] 0.21 7.13 18609329/18609329 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] [10] 38.4 0.21 7.13 18609329 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] 1.10 3.46 5760000/14003920 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] 0.15 0.64 12094646/50859850 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] 0.57 0.00 25863441/66701722 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14] 0.01 0.47 7280621/13944693 Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29] 0.01 0.37 3248000/5946530 Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30] 0.01 0.36 3248000/11893060 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.05 0.22 4177319/50859850 Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35] 0.06 0.25 4842135/50859850 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.15 0.64 12094646/50859850 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] 0.17 0.73 13944693/50859850 Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29] 0.19 0.83 15801057/50859850 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] [11] 17.2 0.62 2.67 50859850 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] 1.55 0.33 34580399/34580399 Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13] 0.04 0.49 5171490/5171490 Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37] 0.08 0.04 3115245/3115245 Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50] 0.11 0.00 7514037/7514037 Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [51] 0.00 0.04 478679/478679 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60] 0.00 2.53 2/2 UnitTests() [3] [12] 13.3 0.00 2.53 2 TorusTest(char const*, double) [12] 0.04 2.48 2/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 2/6 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 2/7 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 2/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 4/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 4/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 4/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 2/18 Imager::SolidObject_Reorientable::RotateX(double) [133] 0.00 0.00 2/15 Imager::Scene::~Scene() [146] 0.00 0.00 2/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 1.55 0.33 34580399/34580399 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] [13] 9.9 1.55 0.33 34580399 Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13] 0.33 0.00 108617482/118023768 Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const [42] 0.00 0.00 9/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.02 0.00 1090769/66701722 Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35] 0.03 0.00 1577619/66701722 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.57 0.00 25863441/66701722 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] 0.84 0.00 38169893/66701722 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] [14] 7.7 1.48 0.00 66701722 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14] 0.00 0.00 24/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.00 0.30 2698530/11893060 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] 0.01 0.36 3248000/11893060 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] 0.01 0.66 5946530/11893060 Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30] [15] 7.0 0.02 1.32 11893060 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.01 0.57 5268088/5268088 Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35] 0.06 0.25 4842135/50859850 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] 0.04 0.12 205218/14003920 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] 0.02 0.14 170828/170828 Imager::SolidObject::Contains(Imager::Vector const&) const [48] 0.01 0.04 3170898/3170898 Imager::SetComplement::Contains(Imager::Vector const&) const [57] 0.03 0.00 1577619/66701722 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14] 0.01 0.01 2068572/9484218 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52] 0.01 0.00 1497631/11368907 Imager::Sphere::Contains(Imager::Vector const&) const [55] 0.00 0.00 23/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.00 1.28 1/1 UnitTests() [3] [16] 6.7 0.00 1.28 1 ChessBoardTest() [16] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.02 0.00 1/1 Imager::SolidObject_Reorientable::RotateZ(double) [69] 0.00 0.00 1/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 3/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 3/6 Imager::Optics::SetOpacity(double) [152] 0.00 0.00 3/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 3/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 1/1 Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&) [172] 0.00 0.00 1/18 Imager::SolidObject_Reorientable::RotateX(double) [133] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 1.27 1/1 UnitTests() [3] [17] 6.6 0.00 1.27 1 BlockTest() [17] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/6 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 1/7 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 1/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 2/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 1/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 1/1425505 Imager::Optics::SetMatteColor(Imager::Color const&) [72] 0.00 0.00 1/1 Imager::Optics::SetGlossColor(Imager::Color const&) [97] 0.00 0.00 3/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 2/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/6 Imager::Optics::SetOpacity(double) [152] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 1.27 1/1 UnitTests() [3] [18] 6.6 0.00 1.27 1 SaturnTest() [18] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/6 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 1/7 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 1/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 1/1 Imager::Saturn::CreateRingSystem() [92] 0.00 0.00 1/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 1.27 1/1 UnitTests() [3] [19] 6.6 0.00 1.27 1 SetIntersectionTest() [19] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/6 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 1/7 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 1/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 2/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 1.27 1/1 UnitTests() [3] [20] 6.6 0.00 1.27 1 BitDonutTest() [20] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/6 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 1/7 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 1/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 2/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 1.27 1/1 UnitTests() [3] [21] 6.6 0.00 1.27 1 SetDifferenceTest() [21] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/7 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 1/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 2/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 1.26 1/1 UnitTests() [3] [22] 6.6 0.00 1.26 1 CuboidTest() [22] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 1/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 1/18 Imager::SolidObject_Reorientable::RotateX(double) [133] 0.00 0.00 1/15 Imager::SolidObject_Reorientable::RotateY(double) [144] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 1.26 1/1 UnitTests() [3] [23] 6.6 0.00 1.26 1 CylinderTest() [23] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 1/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 1/18 Imager::SolidObject_Reorientable::RotateX(double) [133] 0.00 0.00 1/15 Imager::SolidObject_Reorientable::RotateY(double) [144] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 1.26 1/1 UnitTests() [3] [24] 6.6 0.00 1.26 1 SpheroidTest() [24] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 2/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 1/18 Imager::SolidObject_Reorientable::RotateX(double) [133] 0.00 0.00 1/15 Imager::SolidObject_Reorientable::RotateY(double) [144] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 1.26 1/1 UnitTests() [3] [25] 6.6 0.00 1.26 1 MultipleSphereTest() [25] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 2/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 3/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 2/6 Imager::Optics::SetOpacity(double) [152] 0.00 0.00 2/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] <spontaneous> [26] 6.6 0.00 1.26 SphereTest() [26] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 1/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 1.26 1/1 UnitTests() [3] [27] 6.6 0.00 1.26 1 PolyhedraTest() [27] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 3/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 2/3 Imager::TriangleMesh::RotateY(double) [158] 0.00 0.00 2/3 Imager::TriangleMesh::RotateX(double) [157] 0.00 0.00 2/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/1 Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174] 0.00 0.00 1/2 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 1.26 1/1 UnitTests() [3] [28] 6.6 0.00 1.26 1 DodecahedronOverlapTest() [28] 0.02 1.24 1/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] 0.00 0.00 3/29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 1/2 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163] 0.00 0.00 1/3 Imager::TriangleMesh::RotateX(double) [157] 0.00 0.00 1/3 Imager::TriangleMesh::RotateY(double) [158] 0.00 0.00 1/15 Imager::Scene::~Scene() [146] 0.00 0.00 1/20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.16 2486753/13944693 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] 0.01 0.27 4177319/13944693 Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35] 0.01 0.47 7280621/13944693 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] [29] 4.8 0.02 0.90 13944693 Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29] 0.17 0.73 13944693/50859850 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] 0.00 0.30 2698530/5946530 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] 0.01 0.37 3248000/5946530 Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [10] [30] 3.6 0.01 0.67 5946530 Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [30] 0.01 0.66 5946530/11893060 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.00 0.62 15/15 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32] [31] 3.2 0.00 0.62 15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] 0.00 0.51 15/15 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38] 0.01 0.05 3721/17010 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45] 0.01 0.01 15/15 lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67] 0.01 0.00 627238/627238 color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char) [76] 0.00 0.01 30/45 lodepng_add32bitInt(ucvector*, unsigned int) [73] 0.01 0.00 1841000/3682000 getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) [74] 0.00 0.00 195/738585 ucvector_push_back(ucvector*, unsigned char) [39] 0.00 0.00 607/607 lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char) [112] 0.00 0.00 48/48 addUnknownChunks(ucvector*, unsigned char*, unsigned long) [120] 0.00 0.00 45/45 lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [121] 0.00 0.00 30/30 checkColorValidity(LodePNGColorType, unsigned int) [126] 0.00 0.00 15/30 lodepng_info_init(LodePNGInfo*) [125] 0.00 0.00 15/15 lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136] 0.00 0.00 15/15 lodepng_can_have_alpha(LodePNGColorMode const*) [139] 0.00 0.00 15/406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107] 0.00 0.00 15/45 lodepng_info_cleanup(LodePNGInfo*) [122] 0.00 0.00 15/15 preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*) [143] 0.00 0.00 15/15 zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [141] 0.00 0.62 15/15 lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34] [32] 3.2 0.00 0.62 15 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32] 0.00 0.62 15/15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] 0.00 0.00 15/15 lodepng_state_init(LodePNGState*) [137] 0.00 0.00 15/45 lodepng_info_cleanup(LodePNGInfo*) [122] 0.00 0.00 15/15 lodepng_state_cleanup(LodePNGState*) [138] 0.00 0.62 15/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] [33] 3.2 0.00 0.62 15 lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33] 0.00 0.62 15/15 lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34] 0.00 0.00 15/15 lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&) [148] 0.00 0.62 15/15 lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33] [34] 3.2 0.00 0.62 15 lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34] 0.00 0.62 15/15 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32] 0.00 0.00 15/15 void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag) [149] 0.01 0.57 5268088/5268088 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] [35] 3.0 0.01 0.57 5268088 Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [35] 0.01 0.27 4177319/13944693 Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [29] 0.05 0.22 4177319/50859850 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] 0.02 0.00 1090769/66701722 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14] 0.01 0.00 15/594 lodepng_add32bitInt(ucvector*, unsigned int) [73] 0.07 0.00 81/594 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] 0.42 0.03 498/594 ucvector_push_back(ucvector*, unsigned char) [39] [36] 2.8 0.51 0.03 594 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36] 0.03 0.00 5088067/5088067 string_set(char**, char const*) [64] 0.00 0.00 652842/1447152 uivector_push_back(uivector*, unsigned int) [104] 0.00 0.00 255938/255938 searchCodeIndex(unsigned int const*, unsigned long, unsigned long) [108] 0.04 0.49 5171490/5171490 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] [37] 2.8 0.04 0.49 5171490 Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37] 0.08 0.29 5171490/5171490 Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41] 0.12 0.00 1966663/1966663 Imager::Torus::SurfaceNormal(Imager::Vector const&) const [49] 0.00 0.00 13/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.00 0.51 15/15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [38] 2.7 0.00 0.51 15 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38] 0.00 0.28 15/15 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] 0.00 0.22 369210/738585 ucvector_push_back(ucvector*, unsigned char) [39] 0.00 0.00 15/45 lodepng_add32bitInt(ucvector*, unsigned int) [73] 0.00 0.00 15/15 update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61] [142] 0.00 0.00 195/738585 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] 0.00 0.22 369180/738585 addBitToStream(unsigned long*, ucvector*, unsigned char) [46] 0.00 0.22 369210/738585 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38] [39] 2.3 0.00 0.45 738585 ucvector_push_back(ucvector*, unsigned char) [39] 0.42 0.03 498/594 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36] 3262804 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] [40] 1.9 0.25 0.12 3262804 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] 0.05 0.00 9132218/11368907 Imager::Sphere::Contains(Imager::Vector const&) const [55] 0.02 0.01 3203808/9484218 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52] 0.03 0.00 3262804/6856730 Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56] 0.01 0.00 3262804/12028218 Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59] 3262804 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] 0.08 0.29 5171490/5171490 Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37] [41] 1.9 0.08 0.29 5171490 Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41] 0.24 0.03 5171490/5171493 Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44] 0.02 0.00 5171490/12028218 Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59] 0.03 0.00 9406286/118023768 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52] 0.33 0.00 108617482/118023768 Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13] [42] 1.9 0.36 0.00 118023768 Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const [42] 0.00 0.28 15/15 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38] [43] 1.5 0.00 0.28 15 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] 0.02 0.17 12676/17010 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45] 0.07 0.00 81/594 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36] 0.00 0.01 243/243 HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) [80] 0.00 0.00 9043/2953369 addBitToStream(unsigned long*, ucvector*, unsigned char) [46] 0.00 0.00 39141/1447152 uivector_push_back(uivector*, unsigned int) [104] 0.00 0.00 243/729 uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111] 0.00 0.00 243/243 HuffmanTree_cleanup(HuffmanTree*) [113] 0.00 0.00 243/243 HuffmanTree_makeFromLengths2(HuffmanTree*) [114] 0.00 0.00 81/406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107] 0.00 0.00 3/5171493 Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93] 0.24 0.03 5171490/5171493 Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41] [44] 1.4 0.24 0.03 5171493 Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44] 0.03 0.00 5171286/5171292 Algebra::cbrt(std::complex<double>, int) [62] 3050925 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45] 0.00 0.01 613/17010 lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67] 0.01 0.05 3721/17010 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] 0.02 0.17 12676/17010 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] [45] 1.4 0.03 0.23 17010+3050925 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45] 0.01 0.22 2944326/2953369 addBitToStream(unsigned long*, ucvector*, unsigned char) [46] 3050925 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45] 0.00 0.00 9043/2953369 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] 0.01 0.22 2944326/2953369 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45] [46] 1.2 0.01 0.22 2953369 addBitToStream(unsigned long*, ucvector*, unsigned char) [46] 0.00 0.22 369180/738585 ucvector_push_back(ucvector*, unsigned char) [39] 0.04 0.00 18609329/83146683 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] 0.14 0.00 64537354/83146683 Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [7] [47] 0.9 0.18 0.00 83146683 Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [47] 0.02 0.14 170828/170828 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] [48] 0.8 0.02 0.14 170828 Imager::SolidObject::Contains(Imager::Vector const&) const [48] 0.03 0.10 170828/14003920 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] 0.12 0.00 1966663/1966663 Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37] [49] 0.6 0.12 0.00 1966663 Imager::Torus::SurfaceNormal(Imager::Vector const&) const [49] 0.08 0.04 3115245/3115245 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] [50] 0.6 0.08 0.04 3115245 Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50] 0.03 0.00 3115245/6856730 Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56] 0.01 0.00 3115245/12028218 Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59] 0.00 0.00 6/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.11 0.00 7514037/7514037 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] [51] 0.6 0.11 0.00 7514037 Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [51] 0.00 0.00 2/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.01 0.01 1779998/9484218 Imager::SetComplement::Contains(Imager::Vector const&) const [57] 0.01 0.01 2068572/9484218 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.01 0.01 2431840/9484218 Imager::SetUnion::Contains(Imager::Vector const&) const [68] 0.02 0.01 3203808/9484218 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] [52] 0.5 0.05 0.04 9484218 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52] 0.03 0.00 9406286/118023768 Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const [42] 0.01 0.00 77932/77932 Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const [77] 0.04 0.04 3522280/3522280 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] [53] 0.4 0.04 0.04 3522280 Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53] 0.01 0.02 1425504/1425504 Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65] 0.01 0.00 2096776/2096776 Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [75] <spontaneous> [54] 0.3 0.07 0.00 Imager::Scene::PolarizedReflection(double, double, double, double) const [54] 0.00 0.00 739058/11368907 Imager::SetComplement::Contains(Imager::Vector const&) const [57] 0.01 0.00 1497631/11368907 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.05 0.00 9132218/11368907 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] [55] 0.3 0.06 0.00 11368907 Imager::Sphere::Contains(Imager::Vector const&) const [55] 0.00 0.00 2/6856730 Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96] 0.00 0.00 478679/6856730 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60] 0.03 0.00 3115245/6856730 Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50] 0.03 0.00 3262804/6856730 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] [56] 0.3 0.06 0.00 6856730 Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56] 0.01 0.04 3170898/3170898 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] [57] 0.3 0.01 0.04 3170898 Imager::SetComplement::Contains(Imager::Vector const&) const [57] 0.00 0.02 2431840/2431840 Imager::SetUnion::Contains(Imager::Vector const&) const [68] 0.01 0.01 1779998/9484218 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52] 0.00 0.00 739058/11368907 Imager::Sphere::Contains(Imager::Vector const&) const [55] 2906525 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] [58] 0.3 0.05 0.00 2906525 Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [58] 2906525 Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [9] 0.00 0.00 478679/12028218 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60] 0.01 0.00 3115245/12028218 Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50] 0.01 0.00 3262804/12028218 Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const <cycle 4> [40] 0.02 0.00 5171490/12028218 Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [41] [59] 0.2 0.04 0.00 12028218 Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59] 0.00 0.04 478679/478679 Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [11] [60] 0.2 0.00 0.04 478679 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60] 0.03 0.00 957358/957358 Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [63] 0.00 0.00 478679/6856730 Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56] 0.00 0.00 478679/12028218 Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [59] 0.00 0.00 1/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] <spontaneous> [61] 0.2 0.03 0.00 frame_dummy [61] 0.00 0.00 6/5171292 Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [94] 0.03 0.00 5171286/5171292 Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44] [62] 0.2 0.03 0.00 5171292 Algebra::cbrt(std::complex<double>, int) [62] 0.03 0.00 957358/957358 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60] [63] 0.2 0.03 0.00 957358 Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [63] 0.00 0.00 2/88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.03 0.00 5088067/5088067 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36] [64] 0.2 0.03 0.00 5088067 string_set(char**, char const*) [64] 0.01 0.02 1425504/1425504 Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53] [65] 0.2 0.01 0.02 1425504 Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65] 0.00 0.02 1425504/1425505 Imager::Optics::SetMatteColor(Imager::Color const&) [72] 0.00 0.00 1137648/1137648 Imager::ChessBoard::SquareCoordinate(double) const [106] 0.03 0.00 3617416/3617416 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] [66] 0.1 0.03 0.00 3617416 Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const [66] 0.01 0.01 15/15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [67] 0.1 0.01 0.01 15 lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67] 0.00 0.01 613/17010 addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) [45] 0.01 0.00 1841000/3682000 getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) [74] 0.00 0.00 15/406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107] 0.00 0.02 2431840/2431840 Imager::SetComplement::Contains(Imager::Vector const&) const [57] [68] 0.1 0.00 0.02 2431840 Imager::SetUnion::Contains(Imager::Vector const&) const [68] 0.01 0.01 2431840/9484218 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52] 0.02 0.00 1/1 ChessBoardTest() [16] [69] 0.1 0.02 0.00 1 Imager::SolidObject_Reorientable::RotateZ(double) [69] 0.02 0.00 3538132/3538132 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] [70] 0.1 0.02 0.00 3538132 Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const [70] 0.00 0.00 1/1425590 Imager::Optics::SetGlossColor(Imager::Color const&) [97] 0.00 0.00 84/1425590 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.02 0.00 1425505/1425590 Imager::Optics::SetMatteColor(Imager::Color const&) [72] [71] 0.1 0.02 0.00 1425590 Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71] 0.00 0.00 1/1425505 BlockTest() [17] 0.00 0.02 1425504/1425505 Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65] [72] 0.1 0.00 0.02 1425505 Imager::Optics::SetMatteColor(Imager::Color const&) [72] 0.02 0.00 1425505/1425590 Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71] 0.00 0.00 15/45 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38] 0.00 0.01 30/45 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [73] 0.1 0.00 0.01 45 lodepng_add32bitInt(ucvector*, unsigned int) [73] 0.01 0.00 15/594 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36] 0.01 0.00 1841000/3682000 lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67] 0.01 0.00 1841000/3682000 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [74] 0.1 0.01 0.00 3682000 getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) [74] 0.01 0.00 2096776/2096776 Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [53] [75] 0.1 0.01 0.00 2096776 Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [75] 0.01 0.00 627238/627238 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [76] 0.1 0.01 0.00 627238 color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char) [76] 0.01 0.00 77932/77932 Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [52] [77] 0.1 0.01 0.00 77932 Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const [77] 0.01 0.00 3016/3016 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79] [78] 0.1 0.01 0.00 3016 sort_coins(Coin*, unsigned long) [78] 0.00 0.01 243/243 HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) [80] [79] 0.1 0.00 0.01 243 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79] 0.01 0.00 3016/3016 sort_coins(Coin*, unsigned long) [78] 0.00 0.00 573312/1447152 uivector_push_back(uivector*, unsigned int) [104] 0.00 0.00 167321/406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107] 0.00 0.00 3245/3245 cleanup_coins(Coin*, unsigned long) [109] 0.00 0.00 2787/2787 append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) [110] 0.00 0.01 243/243 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] [80] 0.1 0.00 0.01 243 HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) [80] 0.00 0.01 243/243 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79] 0.00 0.00 1/73 CuboidTest() [22] 0.00 0.00 1/73 CylinderTest() [23] 0.00 0.00 1/73 SpheroidTest() [24] 0.00 0.00 1/73 ChessBoardTest() [16] 0.00 0.00 2/73 BlockTest() [17] 0.00 0.00 4/73 TorusTest(char const*, double) [12] 0.00 0.00 5/73 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89] 0.00 0.00 14/73 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 15/73 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 29/73 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] [81] 0.1 0.01 0.00 73 Imager::SolidObject::Translate(double, double, double) [81] <spontaneous> [82] 0.0 0.01 0.00 Imager::SolidObject_BinaryOperator::RotateZ(double) [82] [83] 0.0 0.00 0.00 16+6 <cycle 1 as a whole> [83] 0.00 0.00 14+4 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 8 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89] 4 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 3 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89] 0.00 0.00 1/16 SetDifferenceTest() [21] 0.00 0.00 1/16 SetIntersectionTest() [19] 0.00 0.00 1/16 BitDonutTest() [20] 0.00 0.00 1/16 SaturnTest() [18] 0.00 0.00 1/16 BlockTest() [17] 0.00 0.00 2/16 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 2/16 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 2/16 TorusTest(char const*, double) [12] [84] 0.0 0.00 0.00 14+4 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 29/73 Imager::SolidObject::Translate(double, double, double) [81] 3 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89] 4 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] [85] 0.0 0.00 0.00 7+26 <cycle 3 as a whole> [85] 0.00 0.00 10 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 20 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130] 0.00 0.00 3 Imager::SetComplement::RotateY(double) <cycle 3> [159] 1 Imager::SetComplement::RotateY(double) <cycle 3> [159] 2 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130] 0.00 0.00 1/7 SetDifferenceTest() [21] 0.00 0.00 1/7 SetIntersectionTest() [19] 0.00 0.00 1/7 BitDonutTest() [20] 0.00 0.00 1/7 SaturnTest() [18] 0.00 0.00 1/7 BlockTest() [17] 0.00 0.00 2/7 TorusTest(char const*, double) [12] [86] 0.0 0.00 0.00 10 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 15/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 3/16 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89] 0.00 0.00 2/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 20 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130] [87] 0.0 0.00 0.00 6+23 <cycle 2 as a whole> [87] 0.00 0.00 9 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 18 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134] 0.00 0.00 2 Imager::SetComplement::RotateX(double) <cycle 2> [165] 1 Imager::SetComplement::RotateX(double) <cycle 2> [165] 2 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134] 0.00 0.00 1/6 SetIntersectionTest() [19] 0.00 0.00 1/6 BitDonutTest() [20] 0.00 0.00 1/6 SaturnTest() [18] 0.00 0.00 1/6 BlockTest() [17] 0.00 0.00 2/6 TorusTest(char const*, double) [12] [88] 0.0 0.00 0.00 9 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 14/73 Imager::SolidObject::Translate(double, double, double) [81] 0.00 0.00 2/16 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89] 0.00 0.00 2/16 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 18 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134] 3 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 2/16 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 3/16 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] [89] 0.0 0.00 0.00 8 Imager::SetComplement::Translate(double, double, double) <cycle 1> [89] 0.00 0.00 5/73 Imager::SolidObject::Translate(double, double, double) [81] 3 Imager::SolidObject_BinaryOperator::Translate(double, double, double) <cycle 1> [84] 0.00 0.00 1/21 SphereTest() [26] 0.00 0.00 1/21 CuboidTest() [22] 0.00 0.00 1/21 CylinderTest() [23] 0.00 0.00 1/21 SaturnTest() [18] 0.00 0.00 1/21 BlockTest() [17] 0.00 0.00 2/21 SetDifferenceTest() [21] 0.00 0.00 2/21 SetIntersectionTest() [19] 0.00 0.00 2/21 MultipleSphereTest() [25] 0.00 0.00 3/21 ChessBoardTest() [16] 0.00 0.00 3/21 Imager::Saturn::CreateRingSystem() [92] 0.00 0.00 4/21 TorusTest(char const*, double) [12] [90] 0.0 0.00 0.00 21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 84/1425590 Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71] 0.00 0.00 1/1 UnitTests() [3] [91] 0.0 0.00 0.00 1 Algebra::UnitTest() [91] 0.00 0.00 3/3 Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93] 0.00 0.00 2/2 Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95] 0.00 0.00 2/2 Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96] 0.00 0.00 1/1 SaturnTest() [18] [92] 0.0 0.00 0.00 1 Imager::Saturn::CreateRingSystem() [92] 0.00 0.00 3/21 Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [90] 0.00 0.00 3/3 Algebra::UnitTest() [91] [93] 0.0 0.00 0.00 3 Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93] 0.00 0.00 3/5171493 Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [44] 0.00 0.00 12/22 Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129] 0.00 0.00 3/7 Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151] 0.00 0.00 2/2 Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95] [94] 0.0 0.00 0.00 2 Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [94] 0.00 0.00 6/5171292 Algebra::cbrt(std::complex<double>, int) [62] 0.00 0.00 2/2 Algebra::UnitTest() [91] [95] 0.0 0.00 0.00 2 Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95] 0.00 0.00 2/2 Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [94] 0.00 0.00 6/22 Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129] 0.00 0.00 2/7 Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151] 0.00 0.00 2/2 Algebra::UnitTest() [91] [96] 0.0 0.00 0.00 2 Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96] 0.00 0.00 2/6856730 Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [56] 0.00 0.00 4/22 Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129] 0.00 0.00 2/7 Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151] 0.00 0.00 1/1 BlockTest() [17] [97] 0.0 0.00 0.00 1 Imager::Optics::SetGlossColor(Imager::Color const&) [97] 0.00 0.00 1/1425590 Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [71] 0.00 0.00 39141/1447152 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] 0.00 0.00 181857/1447152 append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) [110] 0.00 0.00 573312/1447152 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79] 0.00 0.00 652842/1447152 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36] [104] 0.0 0.00 0.00 1447152 uivector_push_back(uivector*, unsigned int) [104] 0.00 0.00 238818/406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107] 0.00 0.00 1395302/1395302 Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const <cycle 4> [5] [105] 0.0 0.00 0.00 1395302 Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const [105] 0.00 0.00 1137648/1137648 Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [65] [106] 0.0 0.00 0.00 1137648 Imager::ChessBoard::SquareCoordinate(double) const [106] 0.00 0.00 15/406979 lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [67] 0.00 0.00 15/406979 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] 0.00 0.00 81/406979 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] 0.00 0.00 729/406979 uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111] 0.00 0.00 167321/406979 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79] 0.00 0.00 238818/406979 uivector_push_back(uivector*, unsigned int) [104] [107] 0.0 0.00 0.00 406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107] 0.00 0.00 255938/255938 encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) [36] [108] 0.0 0.00 0.00 255938 searchCodeIndex(unsigned int const*, unsigned long, unsigned long) [108] 0.00 0.00 3245/3245 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79] [109] 0.0 0.00 0.00 3245 cleanup_coins(Coin*, unsigned long) [109] 0.00 0.00 2787/2787 lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [79] [110] 0.0 0.00 0.00 2787 append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) [110] 0.00 0.00 181857/1447152 uivector_push_back(uivector*, unsigned int) [104] 0.00 0.00 243/729 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] 0.00 0.00 486/729 HuffmanTree_makeFromLengths2(HuffmanTree*) [114] [111] 0.0 0.00 0.00 729 uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111] 0.00 0.00 729/406979 lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) [107] 0.00 0.00 607/607 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [112] 0.0 0.00 0.00 607 lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char) [112] 0.00 0.00 243/243 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] [113] 0.0 0.00 0.00 243 HuffmanTree_cleanup(HuffmanTree*) [113] 0.00 0.00 243/243 lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [43] [114] 0.0 0.00 0.00 243 HuffmanTree_makeFromLengths2(HuffmanTree*) [114] 0.00 0.00 486/729 uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] [111] 0.00 0.00 120/120 Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128] [115] 0.0 0.00 0.00 120 Imager::Dodecahedron::CheckEdge(int, int, double) const [115] 0.00 0.00 20/92 Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174] 0.00 0.00 24/92 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163] 0.00 0.00 48/92 Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128] [116] 0.0 0.00 0.00 92 Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116] 0.00 0.00 20/20 std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&) [131] 0.00 0.00 1/88 Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [60] 0.00 0.00 2/88 Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [63] 0.00 0.00 2/88 Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [51] 0.00 0.00 6/88 Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [50] 0.00 0.00 8/88 Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [4] 0.00 0.00 9/88 Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [13] 0.00 0.00 13/88 Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [37] 0.00 0.00 23/88 Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [15] 0.00 0.00 24/88 Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [14] [117] 0.0 0.00 0.00 88 std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [117] 0.00 0.00 48/48 lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [121] [118] 0.0 0.00 0.00 48 lodepng_chunk_generate_crc(unsigned char*) [118] 0.00 0.00 48/48 Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62] [119] 0.00 0.00 48/48 lodepng_chunk_generate_crc(unsigned char*) [118] [119] 0.0 0.00 0.00 48 Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62] [119] 0.00 0.00 48/48 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [120] 0.0 0.00 0.00 48 addUnknownChunks(ucvector*, unsigned char*, unsigned long) [120] 0.00 0.00 45/45 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [121] 0.0 0.00 0.00 45 lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [121] 0.00 0.00 48/48 lodepng_chunk_generate_crc(unsigned char*) [118] 0.00 0.00 15/45 lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136] 0.00 0.00 15/45 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] 0.00 0.00 15/45 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32] [122] 0.0 0.00 0.00 45 lodepng_info_cleanup(LodePNGInfo*) [122] 0.00 0.00 45/45 LodePNGText_cleanup(LodePNGInfo*) [123] 0.00 0.00 45/45 LodePNGIText_cleanup(LodePNGInfo*) [124] 0.00 0.00 45/45 lodepng_info_cleanup(LodePNGInfo*) [122] [123] 0.0 0.00 0.00 45 LodePNGText_cleanup(LodePNGInfo*) [123] 0.00 0.00 45/45 lodepng_info_cleanup(LodePNGInfo*) [122] [124] 0.0 0.00 0.00 45 LodePNGIText_cleanup(LodePNGInfo*) [124] 0.00 0.00 15/30 lodepng_state_init(LodePNGState*) [137] 0.00 0.00 15/30 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [125] 0.0 0.00 0.00 30 lodepng_info_init(LodePNGInfo*) [125] 0.00 0.00 30/30 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [126] 0.0 0.00 0.00 30 checkColorValidity(LodePNGColorType, unsigned int) [126] 0.00 0.00 1/29 SphereTest() [26] 0.00 0.00 1/29 CuboidTest() [22] 0.00 0.00 1/29 SetDifferenceTest() [21] 0.00 0.00 1/29 CylinderTest() [23] 0.00 0.00 1/29 SetIntersectionTest() [19] 0.00 0.00 1/29 SaturnTest() [18] 0.00 0.00 2/29 SpheroidTest() [24] 0.00 0.00 2/29 BitDonutTest() [20] 0.00 0.00 3/29 MultipleSphereTest() [25] 0.00 0.00 3/29 PolyhedraTest() [27] 0.00 0.00 3/29 DodecahedronOverlapTest() [28] 0.00 0.00 3/29 BlockTest() [17] 0.00 0.00 3/29 ChessBoardTest() [16] 0.00 0.00 4/29 TorusTest(char const*, double) [12] [127] 0.0 0.00 0.00 29 std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [127] 0.00 0.00 24/24 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163] [128] 0.0 0.00 0.00 24 Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128] 0.00 0.00 120/120 Imager::Dodecahedron::CheckEdge(int, int, double) const [115] 0.00 0.00 48/92 Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116] 0.00 0.00 4/22 Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96] 0.00 0.00 6/22 Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95] 0.00 0.00 12/22 Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93] [129] 0.0 0.00 0.00 22 Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [129] 20 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] [130] 0.0 0.00 0.00 20 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130] 0.00 0.00 12/15 Imager::SolidObject_Reorientable::RotateY(double) [144] 0.00 0.00 3/5 Imager::Sphere::RotateY(double) [154] 3 Imager::SetComplement::RotateY(double) <cycle 3> [159] 2 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 20/20 Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116] [131] 0.0 0.00 0.00 20 std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&) [131] 0.00 0.00 1/20 SphereTest() [26] 0.00 0.00 1/20 CuboidTest() [22] 0.00 0.00 1/20 SetDifferenceTest() [21] 0.00 0.00 1/20 CylinderTest() [23] 0.00 0.00 1/20 SpheroidTest() [24] 0.00 0.00 1/20 SetIntersectionTest() [19] 0.00 0.00 1/20 BitDonutTest() [20] 0.00 0.00 1/20 SaturnTest() [18] 0.00 0.00 1/20 DodecahedronOverlapTest() [28] 0.00 0.00 2/20 MultipleSphereTest() [25] 0.00 0.00 2/20 PolyhedraTest() [27] 0.00 0.00 2/20 BlockTest() [17] 0.00 0.00 2/20 TorusTest(char const*, double) [12] 0.00 0.00 3/20 ChessBoardTest() [16] [132] 0.0 0.00 0.00 20 std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [132] 0.00 0.00 1/18 CuboidTest() [22] 0.00 0.00 1/18 CylinderTest() [23] 0.00 0.00 1/18 SpheroidTest() [24] 0.00 0.00 1/18 ChessBoardTest() [16] 0.00 0.00 2/18 TorusTest(char const*, double) [12] 0.00 0.00 12/18 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134] [133] 0.0 0.00 0.00 18 Imager::SolidObject_Reorientable::RotateX(double) [133] 18 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] [134] 0.0 0.00 0.00 18 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134] 0.00 0.00 12/18 Imager::SolidObject_Reorientable::RotateX(double) [133] 0.00 0.00 2/3 Imager::Sphere::RotateX(double) [161] 2 Imager::SetComplement::RotateX(double) <cycle 2> [165] 2 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 5/17 Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174] 0.00 0.00 12/17 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163] [135] 0.0 0.00 0.00 17 std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) [135] 0.00 0.00 15/15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [136] 0.0 0.00 0.00 15 lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136] 0.00 0.00 15/45 lodepng_info_cleanup(LodePNGInfo*) [122] 0.00 0.00 15/15 lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*) [140] 0.00 0.00 15/15 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32] [137] 0.0 0.00 0.00 15 lodepng_state_init(LodePNGState*) [137] 0.00 0.00 15/30 lodepng_info_init(LodePNGInfo*) [125] 0.00 0.00 15/15 lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [32] [138] 0.0 0.00 0.00 15 lodepng_state_cleanup(LodePNGState*) [138] 0.00 0.00 15/15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [139] 0.0 0.00 0.00 15 lodepng_can_have_alpha(LodePNGColorMode const*) [139] 0.00 0.00 15/15 lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [136] [140] 0.0 0.00 0.00 15 lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*) [140] 0.00 0.00 15/15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [141] 0.0 0.00 0.00 15 zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [141] 0.00 0.00 15/15 lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [38] [142] 0.0 0.00 0.00 15 update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61] [142] 0.00 0.00 15/15 lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [31] [143] 0.0 0.00 0.00 15 preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*) [143] 0.00 0.00 1/15 CuboidTest() [22] 0.00 0.00 1/15 CylinderTest() [23] 0.00 0.00 1/15 SpheroidTest() [24] 0.00 0.00 12/15 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130] [144] 0.0 0.00 0.00 15 Imager::SolidObject_Reorientable::RotateY(double) [144] 0.00 0.00 15/15 Imager::Scene::~Scene() [146] [145] 0.0 0.00 0.00 15 Imager::Scene::ClearSolidObjectList() [145] 0.00 0.00 7/13 Imager::Sphere::~Sphere() [150] 0.00 0.00 2/2 Imager::SetIntersection::~SetIntersection() [167] 0.00 0.00 2/2 Imager::SetDifference::~SetDifference() [166] 0.00 0.00 2/4 Imager::SetUnion::~SetUnion() [156] 0.00 0.00 1/4 Imager::Cuboid::~Cuboid() [155] 0.00 0.00 1/1 Imager::Spheroid::~Spheroid() [180] 0.00 0.00 1/1 Imager::Cylinder::~Cylinder() [179] 0.00 0.00 1/1 Imager::ConcreteBlock::~ConcreteBlock() [176] 0.00 0.00 1/1 Imager::Saturn::~Saturn() [178] 0.00 0.00 1/2 Imager::Dodecahedron::~Dodecahedron() [164] 0.00 0.00 1/1 Imager::Icosahedron::~Icosahedron() [175] 0.00 0.00 1/1 Imager::ChessBoard::~ChessBoard() [173] 0.00 0.00 1/15 SphereTest() [26] 0.00 0.00 1/15 CuboidTest() [22] 0.00 0.00 1/15 SetDifferenceTest() [21] 0.00 0.00 1/15 CylinderTest() [23] 0.00 0.00 1/15 SpheroidTest() [24] 0.00 0.00 1/15 SetIntersectionTest() [19] 0.00 0.00 1/15 MultipleSphereTest() [25] 0.00 0.00 1/15 PolyhedraTest() [27] 0.00 0.00 1/15 BitDonutTest() [20] 0.00 0.00 1/15 SaturnTest() [18] 0.00 0.00 1/15 DodecahedronOverlapTest() [28] 0.00 0.00 1/15 BlockTest() [17] 0.00 0.00 1/15 ChessBoardTest() [16] 0.00 0.00 2/15 TorusTest(char const*, double) [12] [146] 0.0 0.00 0.00 15 Imager::Scene::~Scene() [146] 0.00 0.00 15/15 Imager::Scene::ClearSolidObjectList() [145] 0.00 0.00 15/15 Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [1] [147] 0.0 0.00 0.00 15 lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int) [147] 0.00 0.00 15/15 lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [33] [148] 0.0 0.00 0.00 15 lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&) [148] 0.00 0.00 15/15 lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [34] [149] 0.0 0.00 0.00 15 void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag) [149] 0.00 0.00 1/13 Imager::SetDifference::~SetDifference() [166] 0.00 0.00 2/13 Imager::SetComplement::~SetComplement() [160] 0.00 0.00 3/13 Imager::SetIntersection::~SetIntersection() [167] 0.00 0.00 7/13 Imager::Scene::ClearSolidObjectList() [145] [150] 0.0 0.00 0.00 13 Imager::Sphere::~Sphere() [150] 0.00 0.00 2/7 Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [96] 0.00 0.00 2/7 Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [95] 0.00 0.00 3/7 Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [93] [151] 0.0 0.00 0.00 7 Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [151] 0.00 0.00 1/6 BlockTest() [17] 0.00 0.00 2/6 MultipleSphereTest() [25] 0.00 0.00 3/6 ChessBoardTest() [16] [152] 0.0 0.00 0.00 6 Imager::Optics::SetOpacity(double) [152] 0.00 0.00 1/5 Imager::SetDifference::~SetDifference() [166] 0.00 0.00 4/5 Imager::SetUnion::~SetUnion() [156] [153] 0.0 0.00 0.00 5 Imager::Torus::~Torus() [153] 0.00 0.00 2/5 Imager::SetComplement::RotateY(double) <cycle 3> [159] 0.00 0.00 3/5 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130] [154] 0.0 0.00 0.00 5 Imager::Sphere::RotateY(double) [154] 0.00 0.00 1/4 Imager::ConcreteBlock::~ConcreteBlock() [176] 0.00 0.00 1/4 Imager::Scene::ClearSolidObjectList() [145] 0.00 0.00 2/4 Imager::SetUnion::~SetUnion() [156] [155] 0.0 0.00 0.00 4 Imager::Cuboid::~Cuboid() [155] 1 Imager::SetUnion::~SetUnion() [156] 0.00 0.00 1/4 Imager::Saturn::~Saturn() [178] 0.00 0.00 1/4 Imager::SetComplement::~SetComplement() [160] 0.00 0.00 2/4 Imager::Scene::ClearSolidObjectList() [145] [156] 0.0 0.00 0.00 4+1 Imager::SetUnion::~SetUnion() [156] 0.00 0.00 4/5 Imager::Torus::~Torus() [153] 0.00 0.00 3/3 Imager::ThinRing::~ThinRing() [162] 0.00 0.00 2/4 Imager::Cuboid::~Cuboid() [155] 1 Imager::SetUnion::~SetUnion() [156] 0.00 0.00 1/3 DodecahedronOverlapTest() [28] 0.00 0.00 2/3 PolyhedraTest() [27] [157] 0.0 0.00 0.00 3 Imager::TriangleMesh::RotateX(double) [157] 0.00 0.00 1/3 DodecahedronOverlapTest() [28] 0.00 0.00 2/3 PolyhedraTest() [27] [158] 0.0 0.00 0.00 3 Imager::TriangleMesh::RotateY(double) [158] 3 Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) <cycle 3> [130] [159] 0.0 0.00 0.00 3 Imager::SetComplement::RotateY(double) <cycle 3> [159] 0.00 0.00 2/5 Imager::Sphere::RotateY(double) [154] 1 Imager::SolidObject_BinaryOperator::RotateY(double) <cycle 3> [86] 0.00 0.00 1/3 Imager::ConcreteBlock::~ConcreteBlock() [176] 0.00 0.00 2/3 Imager::SetDifference::~SetDifference() [166] [160] 0.0 0.00 0.00 3 Imager::SetComplement::~SetComplement() [160] 0.00 0.00 2/13 Imager::Sphere::~Sphere() [150] 0.00 0.00 1/4 Imager::SetUnion::~SetUnion() [156] 0.00 0.00 1/3 Imager::SetComplement::RotateX(double) <cycle 2> [165] 0.00 0.00 2/3 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134] [161] 0.0 0.00 0.00 3 Imager::Sphere::RotateX(double) [161] 0.00 0.00 3/3 Imager::SetUnion::~SetUnion() [156] [162] 0.0 0.00 0.00 3 Imager::ThinRing::~ThinRing() [162] 0.00 0.00 1/2 PolyhedraTest() [27] 0.00 0.00 1/2 DodecahedronOverlapTest() [28] [163] 0.0 0.00 0.00 2 Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [163] 0.00 0.00 24/92 Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116] 0.00 0.00 24/24 Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [128] 0.00 0.00 12/17 std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) [135] 0.00 0.00 1/2 Imager::Scene::ClearSolidObjectList() [145] 0.00 0.00 1/2 Imager::SetIntersection::~SetIntersection() [167] [164] 0.0 0.00 0.00 2 Imager::Dodecahedron::~Dodecahedron() [164] 2 Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) <cycle 2> [134] [165] 0.0 0.00 0.00 2 Imager::SetComplement::RotateX(double) <cycle 2> [165] 0.00 0.00 1/3 Imager::Sphere::RotateX(double) [161] 1 Imager::SolidObject_BinaryOperator::RotateX(double) <cycle 2> [88] 0.00 0.00 2/2 Imager::Scene::ClearSolidObjectList() [145] [166] 0.0 0.00 0.00 2 Imager::SetDifference::~SetDifference() [166] 0.00 0.00 2/3 Imager::SetComplement::~SetComplement() [160] 0.00 0.00 1/13 Imager::Sphere::~Sphere() [150] 0.00 0.00 1/5 Imager::Torus::~Torus() [153] 0.00 0.00 2/2 Imager::Scene::ClearSolidObjectList() [145] [167] 0.0 0.00 0.00 2 Imager::SetIntersection::~SetIntersection() [167] 0.00 0.00 3/13 Imager::Sphere::~Sphere() [150] 0.00 0.00 1/2 Imager::Dodecahedron::~Dodecahedron() [164] 0.00 0.00 1/1 __libc_csu_init [325] [168] 0.0 0.00 0.00 1 _GLOBAL__sub_I__Z9BlockTestv [168] 0.00 0.00 1/1 __libc_csu_init [325] [169] 0.0 0.00 0.00 1 _GLOBAL__sub_I__ZN6Imager5Scene20ClearSolidObjectListEv [169] 0.00 0.00 1/1 __libc_csu_init [325] [170] 0.0 0.00 0.00 1 _GLOBAL__sub_I__ZN6Imager6IndentERSoi [170] 0.00 0.00 1/1 __libc_csu_init [325] [171] 0.0 0.00 0.00 1 _GLOBAL__sub_I__ZN7Algebra20SolveLinearEquationsEddddddddddddRdS0_S0_ [171] 0.00 0.00 1/1 ChessBoardTest() [16] [172] 0.0 0.00 0.00 1 Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&) [172] 0.00 0.00 1/1 Imager::Scene::ClearSolidObjectList() [145] [173] 0.0 0.00 0.00 1 Imager::ChessBoard::~ChessBoard() [173] 0.00 0.00 1/1 PolyhedraTest() [27] [174] 0.0 0.00 0.00 1 Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [174] 0.00 0.00 20/92 Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [116] 0.00 0.00 5/17 std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) [135] 0.00 0.00 1/1 Imager::Scene::ClearSolidObjectList() [145] [175] 0.0 0.00 0.00 1 Imager::Icosahedron::~Icosahedron() [175] 0.00 0.00 1/1 Imager::Scene::ClearSolidObjectList() [145] [176] 0.0 0.00 0.00 1 Imager::ConcreteBlock::~ConcreteBlock() [176] 0.00 0.00 1/4 Imager::Cuboid::~Cuboid() [155] 0.00 0.00 1/3 Imager::SetComplement::~SetComplement() [160] 0.00 0.00 1/1 Imager::Saturn::~Saturn() [178] [177] 0.0 0.00 0.00 1 Imager::Planet::~Planet() [177] 0.00 0.00 1/1 Imager::Scene::ClearSolidObjectList() [145] [178] 0.0 0.00 0.00 1 Imager::Saturn::~Saturn() [178] 0.00 0.00 1/1 Imager::Planet::~Planet() [177] 0.00 0.00 1/4 Imager::SetUnion::~SetUnion() [156] 0.00 0.00 1/1 Imager::Scene::ClearSolidObjectList() [145] [179] 0.0 0.00 0.00 1 Imager::Cylinder::~Cylinder() [179] 0.00 0.00 1/1 Imager::Scene::ClearSolidObjectList() [145] [180] 0.0 0.00 0.00 1 Imager::Spheroid::~Spheroid() [180] � Index by function name [168] _GLOBAL__sub_I__Z9BlockTestv (main.cpp) [80] HuffmanTree_makeFromFrequencies(HuffmanTree*, unsigned int const*, unsigned long, unsigned int) (lodepng.cpp) [91] Algebra::UnitTest() [169] _GLOBAL__sub_I__ZN6Imager5Scene20ClearSolidObjectListEv (scene.cpp) [172] Imager::ChessBoard::ChessBoard(double, double, double, double, Imager::Color const&, Imager::Color const&, Imager::Color const&) [33] lodepng::encode(std::string const&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [170] _GLOBAL__sub_I__ZN6Imager6IndentERSoi (debug.cpp) [173] Imager::ChessBoard::~ChessBoard() [147] lodepng::encode(std::string const&, std::vector<unsigned char, std::allocator<unsigned char> > const&, unsigned int, unsigned int, LodePNGColorType, unsigned int) [171] _GLOBAL__sub_I__ZN7Algebra20SolveLinearEquationsEddddddddddddRdS0_S0_ (algebra.cpp) [174] Imager::Icosahedron::Icosahedron(Imager::Vector, double, Imager::Optics const&) [34] lodepng::encode(std::vector<unsigned char, std::allocator<unsigned char> >&, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [22] CuboidTest() [175] Imager::Icosahedron::~Icosahedron() [148] lodepng::save_file(std::vector<unsigned char, std::allocator<unsigned char> > const&, std::string const&) [18] SaturnTest() [81] Imager::SolidObject::Translate(double, double, double) [106] Imager::ChessBoard::SquareCoordinate(double) const [20] BitDonutTest() [128] Imager::Dodecahedron::AddFace(int, int, int, int, int, Imager::Optics const&, double) [65] Imager::ChessBoard::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [23] CylinderTest() [163] Imager::Dodecahedron::Dodecahedron(Imager::Vector, double, Imager::Optics const&) [66] Imager::SolidObject::SurfaceOptics(Imager::Vector const&, void const*) const [24] SpheroidTest() [164] Imager::Dodecahedron::~Dodecahedron() [48] Imager::SolidObject::Contains(Imager::Vector const&) const [27] PolyhedraTest() [116] Imager::TriangleMesh::AddTriangle(int, int, int, Imager::Optics const&) [115] Imager::Dodecahedron::CheckEdge(int, int, double) const [16] ChessBoardTest() [157] Imager::TriangleMesh::RotateX(double) [70] Imager::TriangleMesh::NormalVector(Imager::TriangleMesh::Triangle const&) const [31] lodepng_encode(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGState*) [158] Imager::TriangleMesh::RotateY(double) [105] Imager::TriangleMesh::SurfaceOptics(Imager::Vector const&, void const*) const [67] lodepng_convert(unsigned char*, unsigned char const*, LodePNGColorMode*, LodePNGColorMode*, unsigned int, unsigned int) [176] Imager::ConcreteBlock::~ConcreteBlock() [4] Imager::TriangleMesh::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [43] lodepng_deflate(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [165] Imager::SetComplement::RotateX(double) [35] Imager::SetComplement::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [21] SetDifferenceTest() [159] Imager::SetComplement::RotateY(double) [57] Imager::SetComplement::Contains(Imager::Vector const&) const [136] lodepng_info_copy(LodePNGInfo*, LodePNGInfo const*) [89] Imager::SetComplement::Translate(double, double, double) [30] Imager::SetIntersection::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [125] lodepng_info_init(LodePNGInfo*) [160] Imager::SetComplement::~SetComplement() [15] Imager::SetIntersection::AppendOverlappingIntersections(Imager::Vector const&, Imager::Vector const&, Imager::SolidObject const&, Imager::SolidObject const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [25] MultipleSphereTest() [166] Imager::SetDifference::~SetDifference() [53] Imager::SolidObject_Reorientable::SurfaceOptics(Imager::Vector const&, void const*) const [137] lodepng_state_init(LodePNGState*) [167] Imager::SetIntersection::~SetIntersection() [11] Imager::SolidObject_Reorientable::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [19] SetIntersectionTest() [47] Imager::PickClosestIntersection(std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > const&, Imager::Intersection&) [75] Imager::SolidObject_Reorientable::ObjectSpace_SurfaceOptics(Imager::Vector const&, void const*) const [112] lodepng_palette_add(LodePNGColorMode*, unsigned char, unsigned char, unsigned char, unsigned char) [133] Imager::SolidObject_Reorientable::RotateX(double) [52] Imager::SolidObject_Reorientable::Contains(Imager::Vector const&) const [121] lodepng_chunk_create(unsigned char**, unsigned long*, unsigned int, char const*, unsigned char const*) [144] Imager::SolidObject_Reorientable::RotateY(double) [6] Imager::Scene::CalculateMatte(Imager::Intersection const&) const [122] lodepng_info_cleanup(LodePNGInfo*) [69] Imager::SolidObject_Reorientable::RotateZ(double) [5] Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const [32] lodepng_encode_memory(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGColorType, unsigned int) [134] Imager::SolidObject_BinaryOperator::NestedRotateX(Imager::SolidObject&, double, double, double) [58] Imager::Scene::CalculateReflection(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const [138] lodepng_state_cleanup(LodePNGState*) [130] Imager::SolidObject_BinaryOperator::NestedRotateY(Imager::SolidObject&, double, double, double) [40] Imager::Scene::CalculateRefraction(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int, double&) const [38] lodepng_zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) [88] Imager::SolidObject_BinaryOperator::RotateX(double) [7] Imager::Scene::HasClearLineOfSight(Imager::Vector const&, Imager::Vector const&) const [139] lodepng_can_have_alpha(LodePNGColorMode const*) [86] Imager::SolidObject_BinaryOperator::RotateY(double) [54] Imager::Scene::PolarizedReflection(double, double, double, double) const [28] DodecahedronOverlapTest() [82] Imager::SolidObject_BinaryOperator::RotateZ(double) [10] Imager::Scene::FindClosestIntersection(Imager::Vector const&, Imager::Vector const&, Imager::Intersection&) const [140] lodepng_color_mode_copy(LodePNGColorMode*, LodePNGColorMode const*) [84] Imager::SolidObject_BinaryOperator::Translate(double, double, double) [9] Imager::Scene::TraceRay(Imager::Vector const&, Imager::Vector const&, double, Imager::Color, int) const [118] lodepng_chunk_generate_crc(unsigned char*) [145] Imager::Scene::ClearSolidObjectList() [1] Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const [79] lodepng_huffman_code_lengths(unsigned int*, unsigned int const*, unsigned long, unsigned int) [146] Imager::Scene::~Scene() [49] Imager::Torus::SurfaceNormal(Imager::Vector const&) const [17] BlockTest() [153] Imager::Torus::~Torus() [41] Imager::Torus::SolveIntersections(Imager::Vector const&, Imager::Vector const&, double*) const [12] TorusTest(char const*, double) [155] Imager::Cuboid::~Cuboid() [77] Imager::Torus::ObjectSpace_Contains(Imager::Vector const&) const [36] encodeLZ77(uivector*, Hash*, unsigned char const*, unsigned long, unsigned long, unsigned int) (lodepng.cpp) [152] Imager::Optics::SetOpacity(double) [37] Imager::Torus::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [78] sort_coins(Coin*, unsigned long) (lodepng.cpp) [97] Imager::Optics::SetGlossColor(Imager::Color const&) [42] Imager::Cuboid::ObjectSpace_Contains(Imager::Vector const&) const [64] string_set(char**, char const*) (lodepng.cpp) [72] Imager::Optics::SetMatteColor(Imager::Color const&) [13] Imager::Cuboid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [109] cleanup_coins(Coin*, unsigned long) (lodepng.cpp) [90] Imager::Optics::SetMatteGlossBalance(double, Imager::Color const&, Imager::Color const&) [71] Imager::Optics::ValidateReflectionColor(Imager::Color const&) const [141] zlib_compress(unsigned char**, unsigned long*, unsigned char const*, unsigned long, LodePNGCompressSettings const*) (lodepng.cpp) [177] Imager::Planet::~Planet() [14] Imager::Sphere::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [46] addBitToStream(unsigned long*, ucvector*, unsigned char) (lodepng.cpp) [92] Imager::Saturn::CreateRingSystem() [55] Imager::Sphere::Contains(Imager::Vector const&) const [76] color_tree_has(ColorTree*, unsigned char, unsigned char, unsigned char, unsigned char) (lodepng.cpp) [178] Imager::Saturn::~Saturn() [63] Imager::Cylinder::AppendDiskIntersection(Imager::Vector const&, Imager::Vector const&, double, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [142] update_adler32(unsigned int, unsigned char const*, unsigned int) [clone .constprop.61] (lodepng.cpp) [161] Imager::Sphere::RotateX(double) [60] Imager::Cylinder::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [108] searchCodeIndex(unsigned int const*, unsigned long, unsigned long) (lodepng.cpp) [154] Imager::Sphere::RotateY(double) [29] Imager::SetUnion::AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [119] Crc32_update_crc(unsigned char const*, unsigned int, unsigned long) [clone .constprop.62] (lodepng.cpp) [150] Imager::Sphere::~Sphere() [68] Imager::SetUnion::Contains(Imager::Vector const&) const [120] addUnknownChunks(ucvector*, unsigned char*, unsigned long) (lodepng.cpp) [179] Imager::Cylinder::~Cylinder() [50] Imager::Spheroid::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [111] uivector_resizev(uivector*, unsigned long, unsigned int) [clone .constprop.64] (lodepng.cpp) [156] Imager::SetUnion::~SetUnion() [51] Imager::ThinRing::ObjectSpace_AppendAllIntersections(Imager::Vector const&, Imager::Vector const&, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >&) const [126] checkColorValidity(LodePNGColorType, unsigned int) (lodepng.cpp) [180] Imager::Spheroid::~Spheroid() [127] std::vector<Imager::LightSource, std::allocator<Imager::LightSource> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::LightSource*, std::vector<Imager::LightSource, std::allocator<Imager::LightSource> > >, Imager::LightSource const&) [74] getPixelColorRGBA8(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsigned char const*, unsigned long, LodePNGColorMode const*) (lodepng.cpp) [162] Imager::ThinRing::~ThinRing() [117] std::vector<Imager::Intersection, std::allocator<Imager::Intersection> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Intersection*, std::vector<Imager::Intersection, std::allocator<Imager::Intersection> > >, Imager::Intersection const&) [39] ucvector_push_back(ucvector*, unsigned char) (lodepng.cpp) [151] Algebra::CheckRoots(int, std::complex<double> const*, std::complex<double> const*) [131] std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::TriangleMesh::Triangle*, std::vector<Imager::TriangleMesh::Triangle, std::allocator<Imager::TriangleMesh::Triangle> > >, Imager::TriangleMesh::Triangle const&) [104] uivector_push_back(uivector*, unsigned int) (lodepng.cpp) [59] Algebra::FilterRealNumbers(int, std::complex<double> const*, double*) [135] std::vector<Imager::Vector, std::allocator<Imager::Vector> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::Vector*, std::vector<Imager::Vector, std::allocator<Imager::Vector> > >, Imager::Vector const&) [113] HuffmanTree_cleanup(HuffmanTree*) (lodepng.cpp) [94] Algebra::SolveCubicEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [132] std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Imager::SolidObject**, std::vector<Imager::SolidObject*, std::allocator<Imager::SolidObject*> > >, Imager::SolidObject* const&) [123] LodePNGText_cleanup(LodePNGInfo*) (lodepng.cpp) [129] Algebra::ValidatePolynomial(int, std::complex<double> const*, std::complex<double>) [149] void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<unsigned char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char*, unsigned char*, std::forward_iterator_tag) [110] append_symbol_coins(Coin*, unsigned int const*, unsigned int, unsigned long) (lodepng.cpp) [95] Algebra::TestKnownCubicRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [61] frame_dummy [73] lodepng_add32bitInt(ucvector*, unsigned int) (lodepng.cpp) [8] Algebra::SolveLinearEquations(double, double, double, double, double, double, double, double, double, double, double, double, double&, double&, double&) [83] <cycle 1> [143] preProcessScanlines(unsigned char**, unsigned long*, unsigned char const*, unsigned int, unsigned int, LodePNGInfo const*, LodePNGEncoderSettings const*) (lodepng.cpp) [44] Algebra::SolveQuarticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [87] <cycle 2> [124] LodePNGIText_cleanup(LodePNGInfo*) (lodepng.cpp) [93] Algebra::TestKnownQuarticRoots(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>) [85] <cycle 3> [45] addBitsToStreamReversed(unsigned long*, ucvector*, unsigned int, unsigned long) (lodepng.cpp) [56] Algebra::SolveQuadraticEquation(std::complex<double>, std::complex<double>, std::complex<double>, std::complex<double>*) [2] <cycle 4> [107] lodepng_color_mode_equal(LodePNGColorMode const*, LodePNGColorMode const*) (lodepng.cpp) [96] Algebra::TestKnownQuadraticRoots(std::complex<double>, std::complex<double>, std::complex<double>) [114] HuffmanTree_makeFromLengths2(HuffmanTree*) (lodepng.cpp) [62] Algebra::cbrt(std::complex<double>, int) Most of the time (99.3%) is spent executing the SaveImage function (Imager::Scene::SaveImage(char const*, unsigned long, unsigned long, double, unsigned long) const). In the additional lodepng code that runs alongside the ray tracer, 94.4% of time is spent in the CalculateLighting function (Imager::Scene::CalculateLighting(Imager::Intersection const&, Imager::Vector const&, double, Imager::Color, int) const). |
Assignment 2
During assignment 2, we tried a simple kernel that took the shape of a dot product, what this achieved was nothing special, actually as predicted at the end of assignment 1, continuously calling cudaMalloc and cudaMemCpy had severe consequences on time.
Initial implementation
//version 1 dot product __global__ void kdot(const float* d_a, const float* d_b, float* d_p, int ni, int nj, int nk) { int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; //matrix multiplication if (i < ni && j < nj) { float sum = 0.0f; for (int k = 0; k < nk; k++) sum += d_a[i * nk + k] * d_b[k * nj + j]; d_p[i * nj + j] = sum; } }
Naive
Naturally this is a naive implementation as we are calling cudaMalloc for each iteration of the training for loop.
cout << "Training the model ...\n"; for (unsigned i = 0; i < 10000; ++i) {
This actually costs us an additional 20 minutes when profiling could be done.
The next steps
Well firstly we had to engage in research as to understand how the actual neural network was learning; for example why they used relu() function, how back-propagation worked and so much more. Some additional sites will be included.
After that and many coffees!
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1) { int BATCH_SIZE = 256; float lr = .01 / BATCH_SIZE; kdot<<< 50,51>>>(ktranspose(d_a2, BATCH_SIZE, 64), d_dyhat, 64, BATCH_SIZE, 10, d_dW3); kdot << <80,32>> >(d_dyhat, ktranspose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2); kreluPrime(d_a2, 128 * 64); for (int i = 0; i < BATCH_SIZE * 10; i++) { d_dz2[i] = d_dz2[i] * d_a2[i]; } kdot << <1024, 32>> >(ktranspose(d_a1, BATCH_SIZE, 128), d_dz2, 128, BATCH_SIZE, 64, d_dW2); kdot << <512,32>> >(d_dz2, ktranspose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1); kreluPrime(d_a1, BATCH_SIZE * 784); for (int i = 0; i < 256 * 64; i++) { d_dz1[i] = d_dz1[i] * d_a1[i]; } kdot <<<512,512,32 >>>(ktranspose(d_b_X, BATCH_SIZE, 784), d_dz1, 784, BATCH_SIZE, 128, d_dW1); // Updating the parameters //W3 = W3 - lr * dW3; for (int i = 0; i < (64*10); i++) { d_W3[i] = d_W3[i] - lr * d_dW3[i]; } //W2 = W2 - lr * dW2; for (int i = 0; i < (128*64); i++) { d_W2[i] = d_W2[i] - lr * d_dW2[i]; } //W1 = W1 - lr * dW1; for (int i = 0; i < (784*128); i++) { d_W1[i] = d_W1[i] - lr * d_dW1[i]; }
}
Dynamic Parallelism
Dynamic Parallelism in CUDA allows for the support of kernels to create and synchronize new nested kernels. Additionally, for our use case it also allows us to spend more time on the device to process information quickly without constant cudaMemcpy() or cudaMalloc() calls.
Parent call Child kernel( ... ) |
---|
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) {
int BATCH_SIZE = 256;
float lr = 0.01 / BATCH_SIZE;
//backpropagation
d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10);
kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
cudaDeviceSynchronize();
}
__global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
//matrix multiplication
if (i < ni && j < nj) {
float sum = 0.0f;
for (int k = 0; k < nk; k++)
sum += d_a[i * nk + k] * d_b[k * nj + j];
d_p[i * nj + j] = sum;
}
} |
Final Iteration
GPU code |
---|
__device__ float* k_difference(const float* m1, const float* m2, const int size) {
/* Returns the difference between the two vectors. */
float* difference = new float[size];
for (int i = 0; i < size; i++) {
difference[i] = m1[i] - m2[i];
}
return difference;
}
__device__ float* k_MFV(const float f, const float* m, const int size) {
float* mult = new float[size];
for (int i = 0; i < size; i++) {
mult[i] = f * m[i];
}
return mult;
}
__device__ float* k_MM(float* m1, float* m2, const int m2_size) {
float* product = new float[m2_size];
for (int i = 0; i != m2_size; ++i) {
product[i] = m1[i] * m2[i];
};
return product;
}
__device__ float* k_transpose(float *m, const int C, const int R) {
/* Returns a transpose matrix of input matrix.
Inputs:
m: vector, input matrix
C: int, number of columns in the input matrix
R: int, number of rows in the input matrix
Output: vector, transpose matrix mT of input matrix m
*/
float* mT = new float[C * R];
for (unsigned n = 0; n != C * R; n++) {
unsigned i = n / C;
unsigned j = n % C;
mT[n] = m[R*j + i];
}
return mT;
//for (int i = 0; i<R; ++i)
// for (int j = 0; j<C; ++j)
// {
// mT[j * C + i] = m[i * R + j];
// }
//return mT;
}
__device__ void dkernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
for (int row = 0; row != ni; ++row) {
for (int col = 0; col != nk; ++col) {
d_p[row * nk + col] = 0.f;
for (int k = 0; k != nj; ++k) {
d_p[row * nk + col] += d_a[row * nj + k] * d_b[k * nk + col];
}
}
}
}
//version 1 dot product
__global__ void kernel_dot(float* d_a, float* d_b, int ni, int nj, int nk, float* d_p) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
//matrix multiplication
if (i < ni && j < nj) {
float sum = 0.0f;
for (int k = 0; k < nk; k++)
sum += d_a[i * nk + k] * d_b[k * nj + j];
d_p[i * nj + j] = sum;
}
}
void cudaCheck(cudaError_t Error) {
if (Error != cudaSuccess) {
cerr << cudaGetErrorName(Error) << "!";
exit(EXIT_FAILURE);
}
}
__device__ float* k_relu(float* a, int n) {
for (int i = 0; i < n; ++i) {
if (a[i] < 0) {
a[i] = 0.01f;
}
else a[i] = a[i];
}
return a;
}
__device__ float* k_reluPrime(float* a, int n) {
for (int i = 0; i < n; ++i) {
if (a[i] > 0) {
a[i] = 1.0f;
}
else a[i] = 0.0;
}
return a;
}
///activation functions __global__
__global__ void kernel_relu(float* a, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(i < n) {
if (a[i] < 0) {
a[i] = 0.01f;
}
else a[i] = a[i];
}
}
__global__ void kernel_reluPrime(float* a, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) {
if (a[i] > 0) {
a[i] = 1.0f;
}
else a[i] = 0.0;
}
}
__device__ void ksoftmax(float *input, int input_len) {
//assert(input != NULL);
//assert(input_len != 0);
int i;
float m;
/* Find maximum value from input array */
m = input[0];
for (i = 1; i < input_len; i++) {
if (input[i] > m) {
m = input[i];
}
}
float sum = 0;
for (i = 0; i < input_len; i++) {
sum += expf(input[i] - m);
}
for (i = 0; i < input_len; i++) {
input[i] = expf(input[i] - m - log(sum));
}
}
__device__ void k_sigmoid(float* m1, int size) {
/* Returns the value of the sigmoid function f(x) = 1/(1 + e^-x).
Input: m1, a vector.
Output: 1/(1 + e^-x) for every element of the input matrix m1.
*/
for (unsigned i = 0; i != size; ++i) {
m1[i] = 1 / (1 + exp(-m1[i]));
}
}
__global__ void feed_forward(float* d_b_X, float* d_W1, float* d_W2, float* d_W3, float* d_b_Y, float* d_a1, float* d_a2, float* d_yhat, float* d_dyhat) {
int BATCH_SIZE = 256;
float lr = 0.01 / BATCH_SIZE;
float* tempY = new float[256 * 64];
//feed forward
kernel_dot <<<256, 256>>> (d_b_X, d_W1, BATCH_SIZE, 784, 128, d_a1);
cudaDeviceSynchronize();
k_relu(d_a1, BATCH_SIZE * 784);
kernel_dot <<<256, 128>>> (d_a1, d_W2, BATCH_SIZE, 128, 64, d_a2);
cudaDeviceSynchronize();
k_relu(d_a2, BATCH_SIZE * 128);
kernel_dot <<<256, 64>>> (d_a2, d_W3, BATCH_SIZE, 64, 10, d_yhat);
cudaDeviceSynchronize();
ksoftmax(tempY, 10 * 10);
for (int i = 0; i < 100; i++) {
d_yhat[i] = tempY[i];
}
delete[] tempY;
}
__global__ void train(float* d_W1, float* d_W2, float* d_W3, float* d_b_X, float* d_b_Y, float* d_a2, float* d_a1, float* d_yhat, float* d_dyhat, float* d_dW3, float* d_dW2, float* d_dW1, float* d_dz2, float* d_dz1, float* d_t) {
cudaError_t Error;
int BATCH_SIZE = 256;
float lr = 0.01 / BATCH_SIZE;
//backpropagation
d_dyhat = k_difference(d_yhat, d_b_Y, 10 * 10);
kernel_dot <<<(2560 + 128)/64, 64>>> (d_dyhat, k_transpose(d_W3, 64, 10), BATCH_SIZE, 10, 64, d_dz2);
cudaDeviceSynchronize();
float* mT = new float[256 * 64 - 1];
for (int i = 0; i < 256; ++i)
for (int j = 0; j < 64; ++j)
{
mT[j * 64 + i] = d_a2[i * 256 + j];
}
kernel_dot <<<(16384 + 256)/64, 64>>> (mT, d_dyhat, 64, BATCH_SIZE, 10, d_dW3);
cudaDeviceSynchronize();
k_reluPrime(d_a2, 256 * 64);
for (int i = 0; i < BATCH_SIZE * 10; i++) {
d_dz2[i] = d_dz2[i] * d_a2[i];
}
mT = new float[256 * 128];
for (int i = 0; i < 256; ++i)
for (int j = 0; j < 128; ++j)
{
mT[j * 128 + i] = d_a1[i * 256 + j];
}
kernel_dot <<<64, 512>>> (mT, d_dz2, 128, BATCH_SIZE, 64, d_dW2);
cudaDeviceSynchronize();
kernel_dot <<<80, 32>>> (d_dz2, k_transpose(d_W2, 128, 64), BATCH_SIZE, 64, 128, d_dz1);
cudaDeviceSynchronize();
k_reluPrime(d_a1, BATCH_SIZE * 784);
for (int i = 0; i < 256 * 64; i++) {
d_dz1[i] = d_dz1[i] * d_a1[i];
}
kernel_dot <<<784, 256>>> (d_t, d_dz1, 784, BATCH_SIZE, 128, d_dW1);
cudaDeviceSynchronize();
//// Updating the parameters
////W3 = W3 - lr * dW3;
d_W3 = k_difference(d_W3, k_MFV(lr, d_dW3, 64 * 10), 64 * 10);
//W2 = W2 - lr * dW2;
d_W2 = k_difference(d_W2, k_MFV(lr, d_dW2, 128 * 64), 128 * 64);
////W1 = W1 - lr * dW1;
d_W1 = k_difference(d_W1, k_MFV(lr, d_dW1, 784 * 128), 784 * 128);
for (int i = 0; i < (784 * 128); i++) {
d_W1[i] = d_W1[i] - lr * d_dW1[i];
}
//for (int i = 0; i != 10; ++i) {
// for (int j = 0; j != 10; ++j) {
// printf("%f ", d_W3[i * 10 + j]);
// }
// printf("\n");
//}
//printf("\n");
//for (int i = 0; i != 10; ++i) {
// for (int j = 0; j != 10; ++j) {
// printf("%f ", d_yhat[i * 10 + j]);
// }
// printf("\n");
//}
//printf("\n");
float* dif;
dif = k_difference(d_b_Y, d_yhat, 10 * 10);
float loss = 0.0;
for (unsigned k = 0; k < BATCH_SIZE * 10; ++k) {
loss += dif[k] * dif[k];
}
printf("%f \n", loss / BATCH_SIZE);
Error = cudaGetLastError();
if (Error != cudaSuccess) {
printf("\n %s \n", Error);
}
}; |
Final Profile
This final profile is only of 20 iterations as we had errors occur beyond 20 iterations, likely due to naive coding and bad coding practice.
Compiling
follow the article to set up visual studios for dynamic parallelism and recommended readings:
http://developer.download.nvidia.com/assets/cuda/files/CUDADownloads/TechBrief_Dynamic_Parallelism_in_CUDA.pdf
http://ramblingsofagamedevstudent.blogspot.com/2014/03/set-up-visual-studio-2012-for-cuda.html
Assignment 3
What we would do differently:
There are many things, one of the major ones is to take on a more manageable task, one with proper documentation and reasoning behind chosen values.