Difference between revisions of "Algo holics"
(→Assignment 1) |
Ssdhillon20 (talk | contribs) (→Kernel Version 2) |
||
(81 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | {{GPU610/DPS915 Index | | + | {{GPU610/DPS915 Index | 20191}} |
− | = | + | = Algo Holics = |
== Team Members == | == Team Members == | ||
− | # [mailto:ssdhillon20@myseneca.ca?subject=GPU610 Sukhbeer Dhillon], | + | # [mailto:ssdhillon20@myseneca.ca?subject=GPU610 Sukhbeer Dhillon], Simple Neural Network |
− | # [mailto:gsingh520@myseneca.ca?subject=gpu610 Gurpreet Singh], | + | # [mailto:gsingh520@myseneca.ca?subject=gpu610 Gurpreet Singh], Sudoku Puzzle Solver |
− | # [mailto:egiang1@myseneca.ca?subject=gpu610 Edgar Giang], | + | # [mailto:egiang1@myseneca.ca?subject=gpu610 Edgar Giang], Merge sort |
#[mailto:ssdhillon20@myseneca.ca;gsingh520@myseneca.ca;egiang1@myseneca.ca?subject=GPU610 Email All] | #[mailto:ssdhillon20@myseneca.ca;gsingh520@myseneca.ca;egiang1@myseneca.ca?subject=GPU610 Email All] | ||
Line 11: | Line 11: | ||
− | + | ====Sudoku Puzzle Solver - Gurpreet Singh==== | |
− | Is it a program that solves Sudoku puzzles(9X9) using Bruteforce algorithm. | + | Is it a program that solves Sudoku puzzles(9X9) using Bruteforce algorithm. The user can either pass a Sudoku files as an input or enter the values manually. Moreover, the file or the manual entry must strictly have 9 rows and 9 columns in them. Last but not the least, all the cells must be separated by a space and the cells that needs to be solved must have 0 in them as their value. |
The original source code can be found at [https://github.com/shafeeq/Sudoku Link] | The original source code can be found at [https://github.com/shafeeq/Sudoku Link] | ||
− | + | =====Logic===== | |
− | In this program the Bruteforce algorithm first put 1 in the first cell and | + | In this program the Bruteforce algorithm first put 1 in the first cell. Then it moves to the second cell and put 1 in there and check if it satisfies all the rules and conditions. If it don't, then the algorithm will increment it's value to 2 and then check again. The value can change from 0-9 to find the correct value for a cell. If none of the value from the range of 0-9 satisfies the cell, then the program will iterate back and change the value of the first cell to 2 and then try the whole process again. In this way it will solve the puzzle. |
+ | |||
+ | =====Compiling the program===== | ||
+ | Enter the following commands: | ||
+ | |||
+ | g++ -std=c++0x -pg solver.cpp checks.cpp checksolution.cpp -o a | ||
+ | a fileName | ||
+ | |||
+ | -pg directs the compiler to include the executable code required for profiling. | ||
+ | |||
+ | -o directs the compiler to name the executable a. | ||
+ | |||
+ | |||
+ | If we run the sample-puzzle-1 (level- easy) file, which has the following text inside it: | ||
+ | 0 6 0 0 0 0 9 7 2 | ||
+ | 0 5 0 0 0 2 0 0 3 | ||
+ | 0 7 0 3 9 0 5 0 0 | ||
+ | 2 0 0 0 0 5 4 0 8 | ||
+ | 0 0 0 0 0 0 0 0 0 | ||
+ | 3 0 1 8 0 0 0 0 6 | ||
+ | 0 0 4 0 2 3 0 8 0 | ||
+ | 7 0 0 9 0 0 0 2 0 | ||
+ | 9 2 5 0 0 0 0 4 0 | ||
+ | |||
+ | The output will be: | ||
+ | |||
+ | 1 6 3 4 5 8 9 7 2 | ||
+ | 4 5 9 7 1 2 8 6 3 | ||
+ | 8 7 2 3 9 6 5 1 4 | ||
+ | 2 9 7 1 6 5 4 3 8 | ||
+ | 5 8 6 2 3 4 1 9 7 | ||
+ | 3 4 1 8 7 9 2 5 6 | ||
+ | 6 1 4 5 2 3 7 8 9 | ||
+ | 7 3 8 9 4 1 6 2 5 | ||
+ | 9 2 5 6 8 7 3 4 1 | ||
+ | |||
+ | |||
+ | =====Analysis===== | ||
+ | To analyze the call graph, enter the following command: | ||
+ | gprof -q -b a> a.clg | ||
+ | -q directs the profiler (gprof) to output a call graph. | ||
+ | |||
+ | -b directs the profiler to omit detailed explanations of the column headings from the output. | ||
+ | |||
+ | |||
+ | '''The call graph for the above execution looks like:''' | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Call Graph | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | Call graph | ||
+ | granularity: each sample hit covers 2 byte(s) no time propagated | ||
+ | index % time self children called name | ||
+ | 0.00 0.00 4539/4539 placeNum(int, int) [10] | ||
+ | [8] 0.0 0.00 0.00 4539 checkRow(int, int) [8] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1620/1620 placeNum(int, int) [10] | ||
+ | [9] 0.0 0.00 0.00 1620 checkColumn(int, int) [9] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1120/1120 solveSudoku() [16] | ||
+ | [10] 0.0 0.00 0.00 1120 placeNum(int, int) [10] | ||
+ | 0.00 0.00 4539/4539 checkRow(int, int) [8] | ||
+ | 0.00 0.00 1620/1620 checkColumn(int, int) [9] | ||
+ | 0.00 0.00 698/698 checkSquare(int, int, int) [11] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 698/698 placeNum(int, int) [10] | ||
+ | [11] 0.0 0.00 0.00 698 checkSquare(int, int, int) [11] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 476/476 solveSudoku() [16] | ||
+ | [12] 0.0 0.00 0.00 476 goBack(int&, int&) [12] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 2/2 main [6] | ||
+ | [13] 0.0 0.00 0.00 2 print(int (*) [9]) [13] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 __libc_csu_init [30] | ||
+ | [14] 0.0 0.00 0.00 1 _GLOBAL__sub_I_sudoku [14] | ||
+ | 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [18] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 __libc_csu_init [30] | ||
+ | [15] 0.0 0.00 0.00 1 _GLOBAL__sub_I_temp [15] | ||
+ | 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [19] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 main [6] | ||
+ | [16] 0.0 0.00 0.00 1 solveSudoku() [16] | ||
+ | 0.00 0.00 1120/1120 placeNum(int, int) [10] | ||
+ | 0.00 0.00 476/476 goBack(int&, int&) [12] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 main [6] | ||
+ | [17] 0.0 0.00 0.00 1 storePositions() [17] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 _GLOBAL__sub_I_sudoku [14] | ||
+ | [18] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [18] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 _GLOBAL__sub_I_temp [15] | ||
+ | [19] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [19] | ||
+ | ----------------------------------------------- | ||
+ | Index by function name | ||
+ | [14] _GLOBAL__sub_I_sudoku [16] solveSudoku() [13] print(int (*) [9]) | ||
+ | [15] _GLOBAL__sub_I_temp [17] storePositions() [12] goBack(int&, int&) | ||
+ | [9] checkColumn(int, int) [18] __static_initialization_and_destruction_0(int, int) [8] checkRow(int, int) | ||
+ | [11] checkSquare(int, int, int) [19] __static_initialization_and_destruction_0(int, int) [10] placeNum(int, int) | ||
+ | |||
+ | |} | ||
+ | |||
+ | From the above Call graph we can see that the program took no time in finding the solution and the maximum number of calls were made to the checkRow, checkColumn and checkSquare function. However, to get a better understanding of the program let's try a harder Sudoku puzzle. | ||
+ | |||
+ | |||
+ | If we run the sample-puzzle-2-hard (Level- hard) file, which has the following text inside it: | ||
+ | |||
+ | 0 0 0 0 0 0 0 0 0 | ||
+ | 0 0 0 0 0 3 0 8 5 | ||
+ | 0 0 1 0 2 0 0 0 0 | ||
+ | 0 0 0 5 0 7 0 0 0 | ||
+ | 0 0 4 0 0 0 1 0 0 | ||
+ | 0 9 0 0 0 0 0 0 0 | ||
+ | 5 0 0 0 0 0 0 7 3 | ||
+ | 0 0 2 0 1 0 0 0 0 | ||
+ | 0 0 0 0 4 0 0 0 9 | ||
+ | |||
+ | The output will be: | ||
+ | |||
+ | 9 8 7 6 5 4 3 2 1 | ||
+ | 2 4 6 1 7 3 9 8 5 | ||
+ | 3 5 1 9 2 8 7 4 6 | ||
+ | 1 2 8 5 3 7 6 9 4 | ||
+ | 6 3 4 8 9 2 1 5 7 | ||
+ | 7 9 5 4 6 1 8 3 2 | ||
+ | 5 1 9 2 8 6 4 7 3 | ||
+ | 4 7 2 3 1 9 5 6 8 | ||
+ | 8 6 3 7 4 5 2 1 9 | ||
+ | |||
+ | The Call graph for the following looks like: | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Call Graph | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | Call graph | ||
+ | granularity: each sample hit covers 2 byte(s) for 0.04% of 26.79 seconds | ||
+ | index % time self children called name | ||
+ | <spontaneous> | ||
+ | [1] 100.0 0.00 26.78 main [1] | ||
+ | 0.68 26.09 1/1 solveSudoku() [2] | ||
+ | 0.01 0.00 1/1 storePositions() [9] | ||
+ | 0.00 0.00 2/2 print(int (*) [9]) [17] | ||
+ | ----------------------------------------------- | ||
+ | 0.68 26.09 1/1 main [1] | ||
+ | [2] 99.9 0.68 26.09 1 solveSudoku() [2] | ||
+ | 3.64 21.56 157353814/157353814 placeNum(int, int) [3] | ||
+ | 0.89 0.00 69175252/69175252 goBack(int&, int&) [7] | ||
+ | ----------------------------------------------- | ||
+ | 3.64 21.56 157353814/157353814 solveSudoku() [2] | ||
+ | [3] 94.1 3.64 21.56 157353814 placeNum(int, int) [3] | ||
+ | 13.31 0.00 622577597/622577597 checkRow(int, int) [4] | ||
+ | 5.04 0.00 223365661/223365661 checkColumn(int, int) [5] | ||
+ | 3.21 0.00 100608583/100608583 checkSquare(int, int, int) [6] | ||
+ | ----------------------------------------------- | ||
+ | 13.31 0.00 622577597/622577597 placeNum(int, int) [3] | ||
+ | [4] 49.7 13.31 0.00 622577597 checkRow(int, int) [4] | ||
+ | ----------------------------------------------- | ||
+ | 5.04 0.00 223365661/223365661 placeNum(int, int) [3] | ||
+ | [5] 18.8 5.04 0.00 223365661 checkColumn(int, int) [5] | ||
+ | ----------------------------------------------- | ||
+ | 3.21 0.00 100608583/100608583 placeNum(int, int) [3] | ||
+ | [6] 12.0 3.21 0.00 100608583 checkSquare(int, int, int) [6] | ||
+ | ----------------------------------------------- | ||
+ | 0.89 0.00 69175252/69175252 solveSudoku() [2] | ||
+ | [7] 3.3 0.89 0.00 69175252 goBack(int&, int&) [7] | ||
+ | ----------------------------------------------- | ||
+ | 0.01 0.00 1/1 __libc_csu_init [10] | ||
+ | [8] 0.0 0.01 0.00 1 _GLOBAL__sub_I_sudoku [8] | ||
+ | 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [19] | ||
+ | ----------------------------------------------- | ||
+ | 0.01 0.00 1/1 main [1] | ||
+ | [9] 0.0 0.01 0.00 1 storePositions() [9] | ||
+ | ----------------------------------------------- | ||
+ | <spontaneous> | ||
+ | [10] 0.0 0.00 0.01 __libc_csu_init [10] | ||
+ | 0.01 0.00 1/1 _GLOBAL__sub_I_sudoku [8] | ||
+ | 0.00 0.00 1/1 _GLOBAL__sub_I_temp [18] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 2/2 main [1] | ||
+ | [17] 0.0 0.00 0.00 2 print(int (*) [9]) [17] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 __libc_csu_init [10] | ||
+ | [18] 0.0 0.00 0.00 1 _GLOBAL__sub_I_temp [18] | ||
+ | 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [20] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 _GLOBAL__sub_I_sudoku [8] | ||
+ | [19] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [19] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 _GLOBAL__sub_I_temp [18] | ||
+ | [20] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [20] | ||
+ | ----------------------------------------------- | ||
+ | Index by function name | ||
+ | [8] _GLOBAL__sub_I_sudoku [2] solveSudoku() [17] print(int (*) [9]) | ||
+ | [18] _GLOBAL__sub_I_temp [9] storePositions() [7] goBack(int&, int&) | ||
+ | [5] checkColumn(int, int) [19] __static_initialization_and_destruction_0(int, int) [4] checkRow(int, int) | ||
+ | [6] checkSquare(int, int, int) [20] __static_initialization_and_destruction_0(int, int) [3] placeNum(int, int) | ||
+ | |||
+ | |||
+ | |} | ||
+ | |||
+ | From the above Call graph we can see that for a harder Sudoku puzzle, the time increased significantly. Moreover, it can also be seen that almost 50% of the time is consumed by the checkRow function, 18.8% by checkColumn and finally 12% by the checkSquare function. Thousand of calls were made to these 3 functions, if we parallelizing these functions then the efficiency of the program can be increased significantly. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ==== Simple Artificial Neural Network - Sukhbeer Singh==== | ||
+ | |||
+ | =====Introduction===== | ||
+ | I am very interested in neural networks and I started learning about them recently. I think this is a good opportunity to build on my knowledge of a NN while also parallelising it. For that purpose, I have selected a very basic neural network which feeds forward with ReLu and softmax and back-propagates on a sample batch from MNIST handwritten digits dataset. In each iteration, the weights are adjusted to train the network for better predictions. | ||
+ | The code performs matrix-multiplication(dot-product) each time that the activation vector and delta vector are calculated for the next node and the previous node respectively. | ||
+ | |||
+ | =====Source Code===== | ||
+ | Here is the [https://cognitivedemons.wordpress.com/2018/06/08/neural-network-in-c-part-2-mnist-handwritten-digits-dataset/ source code] used. | ||
+ | |||
+ | I changed the source code to put the network's training in a separate function called train. Also, I used cblas library to do matrix-matrix multiplication inside dot function, instead of using three nested for loops. | ||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! nn.cpp | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | // To compile: g++ -o nn nn.cpp -std=c++11 -lgslcblas | ||
+ | // To run: ./nn | ||
+ | // Created by Sergei Bugrov on 4/20/18. | ||
+ | // Copyright © 2017 Sergei Bugrov. All rights reserved. | ||
+ | // Download dataset from: https://drive.google.com/file/d/1tVyvg6c1Eo5ojtiz0R17YEzcUe5cN285/view | ||
+ | // Updated By - Sukhbeer Singh Dhillon, March 23rd 2019 | ||
+ | |||
+ | #include <iostream> | ||
+ | #include <iomanip> | ||
+ | #include <cstdlib> | ||
+ | #include <vector> | ||
+ | #include <math.h> | ||
+ | #include <fstream> | ||
+ | #include <sstream> | ||
+ | #include <string> | ||
+ | #include <random> | ||
+ | #include <algorithm> | ||
+ | extern "C"{ | ||
+ | #include<gsl/gsl_cblas.h> | ||
+ | } | ||
+ | |||
+ | using namespace std; | ||
+ | |||
+ | vector<float> print ( const vector <float>& m, int n_rows, int n_columns ) { | ||
+ | /* "Couts" the input vector as n_rows x n_columns matrix. | ||
+ | Inputs: | ||
+ | m: vector, matrix of size n_rows x n_columns | ||
+ | n_rows: int, number of rows in the left matrix m1 | ||
+ | n_columns: int, number of columns in the left matrix m1 | ||
+ | */ | ||
+ | vector<float> outputDigits; | ||
+ | for( int i = 0; i != n_rows; ++i ) { | ||
+ | float digitPredicted = 0.0; | ||
+ | int index = 0; | ||
+ | for( int j = 0; j != n_columns; ++j ) { | ||
+ | float currentValue = m[i * n_columns + j]; | ||
+ | cout << currentValue << " "; | ||
+ | if(currentValue > digitPredicted){ | ||
+ | digitPredicted = currentValue; | ||
+ | index = j; | ||
+ | } | ||
+ | } | ||
+ | outputDigits.push_back(index); | ||
+ | cout << " --> Digit = " << index <<"\n"; | ||
+ | } | ||
+ | cout << endl; | ||
+ | return outputDigits; | ||
+ | } | ||
+ | |||
+ | int argmax ( const vector <float>& m ) { | ||
+ | return distance(m.begin(), max_element(m.begin(), m.end())); | ||
+ | } | ||
+ | |||
+ | vector <float> relu(const vector <float>& z){ | ||
+ | int size = z.size(); | ||
+ | vector <float> output; | ||
+ | for( int i = 0; i < size; ++i ) { | ||
+ | if (z[i] < 0){ | ||
+ | output.push_back(0.0); | ||
+ | } | ||
+ | else output.push_back(z[i]); | ||
+ | } | ||
+ | return output; | ||
+ | } | ||
+ | |||
+ | vector <float> reluPrime (const vector <float>& z) { | ||
+ | int size = z.size(); | ||
+ | vector <float> output; | ||
+ | for( int i = 0; i < size; ++i ) { | ||
+ | if (z[i] <= 0){ | ||
+ | output.push_back(0.0); | ||
+ | } | ||
+ | else output.push_back(1.0); | ||
+ | } | ||
+ | return output; | ||
+ | } | ||
+ | |||
+ | static vector<float> random_vector(const int size){ | ||
+ | random_device rd; | ||
+ | mt19937 gen(rd()); | ||
+ | uniform_real_distribution<> distribution(0.0, 0.05); | ||
+ | static default_random_engine generator; | ||
+ | vector<float> data(size); | ||
+ | generate(data.begin(), data.end(), [&]() { return distribution(generator); }); | ||
+ | return data; | ||
+ | } | ||
+ | |||
+ | vector <float> softmax (const vector <float>& z, const int dim) { | ||
+ | const int zsize = static_cast<int>(z.size()); | ||
+ | vector <float> out; | ||
+ | for (unsigned i = 0; i != zsize; i += dim) { | ||
+ | vector <float> foo; | ||
+ | for (unsigned j = 0; j != dim; ++j) { | ||
+ | foo.push_back(z[i + j]); | ||
+ | } | ||
+ | float max_foo = *max_element(foo.begin(), foo.end()); | ||
+ | for (unsigned j = 0; j != dim; ++j) { | ||
+ | foo[j] = exp(foo[j] - max_foo); | ||
+ | } | ||
+ | float sum_of_elems = 0.0; | ||
+ | for (unsigned j = 0; j != dim; ++j) { | ||
+ | sum_of_elems = sum_of_elems + foo[j]; | ||
+ | } | ||
+ | |||
+ | for (unsigned j = 0; j != dim; ++j) { | ||
+ | out.push_back(foo[j]/sum_of_elems); | ||
+ | } | ||
+ | } | ||
+ | return out; | ||
+ | } | ||
+ | |||
+ | vector <float> sigmoid_d (const vector <float>& m1) { | ||
+ | |||
+ | /* Returns the value of the sigmoid function derivative f'(x) = f(x)(1 - f(x)), | ||
+ | where f(x) is sigmoid function. | ||
+ | Input: m1, a vector. | ||
+ | Output: x(1 - x) for every element of the input matrix m1. | ||
+ | */ | ||
+ | |||
+ | const unsigned long VECTOR_SIZE = m1.size(); | ||
+ | vector <float> output (VECTOR_SIZE); | ||
+ | |||
+ | |||
+ | for( unsigned i = 0; i != VECTOR_SIZE; ++i ) { | ||
+ | output[ i ] = m1[ i ] * (1 - m1[ i ]); | ||
+ | } | ||
+ | |||
+ | return output; | ||
+ | } | ||
+ | |||
+ | vector <float> sigmoid (const vector <float>& m1) { | ||
+ | |||
+ | /* Returns the value of the sigmoid function f(x) = 1/(1 + e^-x). | ||
+ | Input: m1, a vector. | ||
+ | Output: 1/(1 + e^-x) for every element of the input matrix m1. | ||
+ | */ | ||
+ | |||
+ | const unsigned long VECTOR_SIZE = m1.size(); | ||
+ | vector <float> output (VECTOR_SIZE); | ||
+ | |||
+ | |||
+ | for( unsigned i = 0; i != VECTOR_SIZE; ++i ) { | ||
+ | output[ i ] = 1 / (1 + exp(-m1[ i ])); | ||
+ | } | ||
+ | |||
+ | return output; | ||
+ | } | ||
+ | |||
+ | vector <float> operator+(const vector <float>& m1, const vector <float>& m2){ | ||
+ | |||
+ | /* Returns the elementwise sum of two vectors. | ||
+ | Inputs: | ||
+ | m1: a vector | ||
+ | m2: a vector | ||
+ | Output: a vector, sum of the vectors m1 and m2. | ||
+ | */ | ||
+ | |||
+ | const unsigned long VECTOR_SIZE = m1.size(); | ||
+ | vector <float> sum (VECTOR_SIZE); | ||
+ | |||
+ | for (unsigned i = 0; i != VECTOR_SIZE; ++i){ | ||
+ | sum[i] = m1[i] + m2[i]; | ||
+ | }; | ||
+ | |||
+ | return sum; | ||
+ | } | ||
+ | |||
+ | vector <float> operator-(const vector <float>& m1, const vector <float>& m2){ | ||
+ | |||
+ | /* Returns the difference between two vectors. | ||
+ | Inputs: | ||
+ | m1: vector | ||
+ | m2: vector | ||
+ | Output: vector, m1 - m2, difference between two vectors m1 and m2. | ||
+ | */ | ||
+ | |||
+ | const unsigned long VECTOR_SIZE = m1.size(); | ||
+ | vector <float> difference (VECTOR_SIZE); | ||
+ | |||
+ | for (unsigned i = 0; i != VECTOR_SIZE; ++i){ | ||
+ | difference[i] = m1[i] - m2[i]; | ||
+ | }; | ||
+ | |||
+ | return difference; | ||
+ | } | ||
+ | |||
+ | vector <float> operator*(const vector <float>& m1, const vector <float>& m2){ | ||
+ | |||
+ | /* Returns the product of two vectors (elementwise multiplication). | ||
+ | Inputs: | ||
+ | m1: vector | ||
+ | m2: vector | ||
+ | Output: vector, m1 * m2, product of two vectors m1 and m2 | ||
+ | */ | ||
+ | |||
+ | const unsigned long VECTOR_SIZE = m1.size(); | ||
+ | vector <float> product (VECTOR_SIZE); | ||
+ | |||
+ | for (unsigned i = 0; i != VECTOR_SIZE; ++i){ | ||
+ | product[i] = m1[i] * m2[i]; | ||
+ | }; | ||
+ | |||
+ | return product; | ||
+ | } | ||
+ | |||
+ | vector <float> operator*(const float m1, const vector <float>& m2){ | ||
+ | |||
+ | /* Returns the product of a float and a vectors (elementwise multiplication). | ||
+ | Inputs: | ||
+ | m1: float | ||
+ | m2: vector | ||
+ | Output: vector, m1 * m2, product of two vectors m1 and m2 | ||
+ | */ | ||
+ | |||
+ | const unsigned long VECTOR_SIZE = m2.size(); | ||
+ | vector <float> product (VECTOR_SIZE); | ||
+ | |||
+ | for (unsigned i = 0; i != VECTOR_SIZE; ++i){ | ||
+ | product[i] = m1 * m2[i]; | ||
+ | }; | ||
+ | |||
+ | return product; | ||
+ | } | ||
+ | |||
+ | vector <float> operator/(const vector <float>& m2, const float m1){ | ||
+ | |||
+ | /* Returns the product of a float and a vectors (elementwise multiplication). | ||
+ | Inputs: | ||
+ | m1: float | ||
+ | m2: vector | ||
+ | Output: vector, m1 * m2, product of two vectors m1 and m2 | ||
+ | */ | ||
+ | |||
+ | const unsigned long VECTOR_SIZE = m2.size(); | ||
+ | vector <float> product (VECTOR_SIZE); | ||
+ | |||
+ | for (unsigned i = 0; i != VECTOR_SIZE; ++i){ | ||
+ | product[i] = m2[i] / m1; | ||
+ | }; | ||
+ | |||
+ | return product; | ||
+ | } | ||
+ | |||
+ | vector <float> transpose (float *m, const int C, const int R) { | ||
+ | |||
+ | /* Returns a transpose matrix of input matrix. | ||
+ | Inputs: | ||
+ | m: vector, input matrix | ||
+ | C: int, number of columns in the input matrix | ||
+ | R: int, number of rows in the input matrix | ||
+ | Output: vector, transpose matrix mT of input matrix m | ||
+ | */ | ||
+ | |||
+ | vector <float> mT (C*R); | ||
+ | |||
+ | for(unsigned n = 0; n != C*R; n++) { | ||
+ | unsigned i = n/C; | ||
+ | unsigned j = n%C; | ||
+ | mT[n] = m[R*j + i]; | ||
+ | } | ||
+ | |||
+ | return mT; | ||
+ | } | ||
+ | |||
+ | vector <float> dot (const vector <float>& m1, const vector <float>& m2, const int m1_rows, const int m1_columns, const int m2_columns) { | ||
+ | |||
+ | /* Returns the product of two matrices: m1 x m2. | ||
+ | Inputs: | ||
+ | m1: vector, left matrix of size m1_rows x m1_columns | ||
+ | m2: vector, right matrix of size m1_columns x m2_columns (the number of rows in the right matrix | ||
+ | must be equal to the number of the columns in the left one) | ||
+ | m1_rows: int, number of rows in the left matrix m1 | ||
+ | m1_columns: int, number of columns in the left matrix m1 | ||
+ | m2_columns: int, number of columns in the right matrix m2 | ||
+ | Output: vector, m1 * m2, product of two vectors m1 and m2, a matrix of size m1_rows x m2_columns | ||
+ | */ | ||
+ | |||
+ | |||
+ | //use cblas | ||
+ | vector <float> output (m1_rows*m2_columns); | ||
+ | cblas_sgemm(CblasRowMajor,CblasNoTrans, CblasNoTrans, m1_rows, m2_columns, m1_columns, 1.0, m1.data(), m1_columns, m2.data(), m2_columns, 0.0, output.data(), m2_columns); | ||
+ | return output; | ||
+ | } | ||
+ | |||
+ | vector<string> split(const string &s, char delim) { | ||
+ | stringstream ss(s); | ||
+ | string item; | ||
+ | vector<string> tokens; | ||
+ | while (getline(ss, item, delim)) { | ||
+ | tokens.push_back(item); | ||
+ | } | ||
+ | return tokens; | ||
+ | } | ||
+ | |||
+ | void displayPrediction(vector<float> y_prediction, vector<float> y_output){ | ||
+ | cout << "Predictions:" << "\n"; | ||
+ | vector<float> predict = print ( y_prediction, 10, 10 ); | ||
+ | cout << "Ground truth:" << "\n"; | ||
+ | vector<float> output = print ( y_output, 10, 10 ); | ||
+ | int accuracy = 0; | ||
+ | for(int i=0; i< output.size(); i++){ | ||
+ | if(predict[i]==output[i]) | ||
+ | accuracy++; | ||
+ | } | ||
+ | cout<<" Accuracy = " << accuracy << "/" << output.size() << std::endl; | ||
+ | } | ||
+ | |||
+ | //Train the model in this function- | ||
+ | //initialize the training data | ||
+ | //Feed Forward | ||
+ | //Back Propogate | ||
+ | //Adjust Weight | ||
+ | //Display epoch data - change this to something meaningful | ||
+ | //@params -x training data y labels | ||
+ | void train(vector<float> x_input, vector<float> y_output, vector<float>& weight1, vector<float>& weight2, vector<float>& weight3){ | ||
+ | int BATCH_SIZE = 256; | ||
+ | float lr = .01/BATCH_SIZE; | ||
+ | for(unsigned i = 0; i< 10000; i++){ | ||
+ | // Building batches of input variables (X) and labels (y) | ||
+ | int randindx = rand() % (42000-BATCH_SIZE); | ||
+ | vector<float> b_X; | ||
+ | vector<float> b_y; | ||
+ | for (unsigned j = randindx*784; j < (randindx+BATCH_SIZE)*784; ++j){ | ||
+ | b_X.push_back(x_input[j]); | ||
+ | } | ||
+ | for (unsigned k = randindx*10; k < (randindx+BATCH_SIZE)*10; ++k){ | ||
+ | b_y.push_back(y_output[k]); | ||
+ | } | ||
+ | |||
+ | // Feed forward | ||
+ | vector<float> a1 = relu(dot( b_X, weight1, BATCH_SIZE, 784, 128 )); | ||
+ | vector<float> a2 = relu(dot( a1, weight2, BATCH_SIZE, 128, 64 )); | ||
+ | vector<float> yhat = softmax(dot( a2, weight3, BATCH_SIZE, 64, 10 ), 10); | ||
+ | |||
+ | // Back propagation | ||
+ | vector<float> dyhat = (yhat - b_y); | ||
+ | // dW3 = a2.T * dyhat | ||
+ | vector<float> dW3 = dot(transpose( &a2[0], BATCH_SIZE, 64 ), dyhat, 64, BATCH_SIZE, 10); | ||
+ | // dz2 = dyhat * W3.T * relu'(a2) | ||
+ | vector<float> dz2 = dot(dyhat, transpose( &weight3[0], 64, 10 ), BATCH_SIZE, 10, 64) * reluPrime(a2); | ||
+ | // dW2 = a1.T * dz2 | ||
+ | vector<float> dW2 = dot(transpose( &a1[0], BATCH_SIZE, 128 ), dz2, 128, BATCH_SIZE, 64); | ||
+ | // dz1 = dz2 * W2.T * relu'(a1) | ||
+ | vector<float> dz1 = dot(dz2, transpose( &weight2[0], 128, 64 ), BATCH_SIZE, 64, 128) * reluPrime(a1); | ||
+ | // dW1 = X.T * dz1 | ||
+ | vector<float> dW1 = dot(transpose( &b_X[0], BATCH_SIZE, 784 ), dz1, 784, BATCH_SIZE, 128); | ||
+ | |||
+ | // Updating the parameters | ||
+ | weight3 = weight3 - lr * dW3; | ||
+ | weight2 = weight2 - lr * dW2; | ||
+ | weight1 = weight1 - lr * dW1; | ||
+ | |||
+ | if ((i+1) % 1000 == 0){ | ||
+ | cout << "-----------------------------------------------Epoch " << i+1 << "--------------------------------------------------" <<"\n"; | ||
+ | displayPrediction(yhat, y_output); | ||
+ | /*cout << "Predictions:" << "\n"; | ||
+ | print ( yhat, 10, 10 ); | ||
+ | cout << "Ground truth:" << "\n"; | ||
+ | print ( b_y, 10, 10 );*/ | ||
+ | vector<float> loss_m = yhat - b_y; | ||
+ | float loss = 0.0; | ||
+ | for (unsigned k = 0; k < BATCH_SIZE*10; ++k){ | ||
+ | loss += loss_m[k] * loss_m[k]; | ||
+ | } | ||
+ | cout << " Loss " << loss/BATCH_SIZE <<"\n"; | ||
+ | cout << "--------------------------------------------End of Epoch :(------------------------------------------------" <<"\n"; | ||
+ | }; | ||
+ | } | ||
+ | } | ||
+ | |||
+ | |||
+ | int main(int argc, const char * argv[]) { | ||
+ | |||
+ | string line; | ||
+ | vector<string> line_v; | ||
+ | |||
+ | cout << "Loading data ...\n"; | ||
+ | vector<float> X_train; | ||
+ | vector<float> y_train; | ||
+ | ifstream myfile ("train.txt"); | ||
+ | if (myfile.is_open()) | ||
+ | { | ||
+ | while ( getline (myfile,line) ) | ||
+ | { | ||
+ | line_v = split(line, '\t'); | ||
+ | int digit = strtof((line_v[0]).c_str(),0); | ||
+ | for (unsigned i = 0; i < 10; ++i) { | ||
+ | if (i == digit) | ||
+ | { | ||
+ | y_train.push_back(1.); | ||
+ | } | ||
+ | else y_train.push_back(0.); | ||
+ | } | ||
+ | |||
+ | int size = static_cast<int>(line_v.size()); | ||
+ | for (unsigned i = 1; i < size; ++i) { | ||
+ | X_train.push_back(strtof((line_v[i]).c_str(),0)); | ||
+ | } | ||
+ | } | ||
+ | X_train = X_train/255.0; | ||
+ | myfile.close(); | ||
+ | } | ||
+ | |||
+ | else cout << "Unable to open file" << '\n'; | ||
+ | |||
+ | int xsize = static_cast<int>(X_train.size()); | ||
+ | int ysize = static_cast<int>(y_train.size()); | ||
+ | |||
+ | // Random initialization of the weights | ||
+ | vector <float> W1 = random_vector(784*128); | ||
+ | vector <float> W2 = random_vector(128*64); | ||
+ | vector <float> W3 = random_vector(64*10); | ||
+ | |||
+ | cout << "Training the model ...\n"; | ||
+ | train(X_train, y_train, W1, W2, W3); | ||
+ | return 0; | ||
+ | } | ||
+ | |||
+ | |||
+ | |} | ||
+ | |||
+ | The result given below is the comparison of predictions as made by the trained network with the actual output after 10000 iterations. The ground truth is the actual value of the labels between 0-9 (true for the corresponding digit in the dataset). | ||
+ | As you see, the accuracy of the network is not that great. | ||
+ | |||
+ | -----------------------------------------------Epoch 10000-------------------------------------------------- | ||
+ | Predictions: | ||
+ | 0.000848207 9.07445e-06 0.000145165 0.797735 4.94866e-06 0.19374 1.55013e-06 0.000244941 0.00657041 0.000700498 --> Digit = 3 | ||
+ | 1.36476e-05 1.07548e-07 8.3835e-05 0.000744837 0.299883 9.37717e-05 3.53349e-05 0.00822595 0.00210021 0.688819 --> Digit = 9 | ||
+ | 5.11556e-06 0.000616957 0.000233088 0.87458 2.20579e-05 0.0140489 5.03569e-08 0.000518445 0.0826038 0.0273714 --> Digit = 3 | ||
+ | 0.0178851 3.64621e-08 0.0174107 0.000322792 0.716312 0.00120967 0.189534 0.00303238 0.00613965 0.0481543 --> Digit = 4 | ||
+ | 7.40077e-07 0.96872 0.014224 0.00555447 2.56397e-05 0.000115577 0.000157107 0.00366156 0.00669771 0.000842866 --> Digit = 1 | ||
+ | 7.37584e-05 0.00306397 0.0184482 0.056542 0.000217984 0.0807415 0.000430994 1.09367e-05 0.838792 0.00167921 --> Digit = 8 | ||
+ | 1.23026e-05 1.10682e-09 6.47478e-07 0.000129503 1.28475e-05 1.20242e-05 1.18166e-09 0.953265 2.63176e-05 0.046541 --> Digit = 7 | ||
+ | 0.974183 3.50241e-18 1.99895e-07 3.4534e-07 2.3755e-11 0.0257772 1.96811e-09 6.99407e-09 3.92052e-05 2.28711e-08 --> Digit = 0 | ||
+ | 2.21581e-05 9.26954e-09 0.000182046 0.00336899 3.40876e-05 0.0800376 8.35955e-07 1.2496e-07 0.914781 0.00157335 --> Digit = 8 | ||
+ | 8.59312e-07 4.1739e-05 0.000106891 0.000122639 0.00018295 4.02451e-05 7.21105e-07 0.898311 0.00405182 0.0971408 --> Digit = 7 | ||
+ | |||
+ | Ground truth: | ||
+ | 0 1 0 0 0 0 0 0 0 0 --> Digit = 1 | ||
+ | 1 0 0 0 0 0 0 0 0 0 --> Digit = 0 | ||
+ | 0 1 0 0 0 0 0 0 0 0 --> Digit = 1 | ||
+ | 0 0 0 0 1 0 0 0 0 0 --> Digit = 4 | ||
+ | 1 0 0 0 0 0 0 0 0 0 --> Digit = 0 | ||
+ | 1 0 0 0 0 0 0 0 0 0 --> Digit = 0 | ||
+ | 0 0 0 0 0 0 0 1 0 0 --> Digit = 7 | ||
+ | 0 0 0 1 0 0 0 0 0 0 --> Digit = 3 | ||
+ | 0 0 0 0 0 1 0 0 0 0 --> Digit = 5 | ||
+ | 0 0 0 1 0 0 0 0 0 0 --> Digit = 3 | ||
+ | |||
+ | Accuracy = 2/10 | ||
+ | Loss 0.184251 | ||
+ | --------------------------------------------End of Epoch :(------------------------------------------------ | ||
+ | |||
+ | =====Profiling===== | ||
+ | Here are the results of profiling the program | ||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Flat profile | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | Flat profile: | ||
+ | |||
+ | Each sample counts as 0.01 seconds. | ||
+ | % cumulative self self total | ||
+ | time seconds seconds calls s/call s/call name | ||
+ | 99.29 1075.84 1075.84 displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) | ||
+ | 0.24 1078.44 2.60 3 0.87 1.16 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) | ||
+ | 0.22 1080.88 2.43 split(std::string const&, char) | ||
+ | 0.15 1082.50 1.62 print(std::vector<float, std::allocator<float> > const&, int, int) | ||
+ | 0.10 1083.61 1.11 reluPrime(std::vector<float, std::allocator<float> > const&) | ||
+ | 0.08 1084.48 0.87 519195026 0.00 0.00 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() | ||
+ | 0.02 1084.68 0.20 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() | ||
+ | 0.00 1084.68 0.00 1 0.00 0.00 transpose(float*, int, int) | ||
+ | |||
+ | |||
+ | |} | ||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Call graph | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | Call graph | ||
+ | |||
+ | granularity: each sample hit covers 2 byte(s) for 0.00% of 1084.68 seconds | ||
+ | |||
+ | index % time self children called name | ||
+ | <spontaneous> | ||
+ | [1] 99.2 1075.84 0.00 displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) [1] | ||
+ | ----------------------------------------------- | ||
+ | 14012000 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] | ||
+ | 2.60 0.87 3/3 relu(std::vector<float, std::allocator<float> > const&) [3] | ||
+ | [2] 0.3 2.60 0.87 3+14012000 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] | ||
+ | 0.87 0.00 519195026/519195026 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [7] | ||
+ | 14012000 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] | ||
+ | ----------------------------------------------- | ||
+ | <spontaneous> | ||
+ | [3] 0.3 0.00 3.47 relu(std::vector<float, std::allocator<float> > const&) [3] | ||
+ | 2.60 0.87 3/3 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] | ||
+ | ----------------------------------------------- | ||
+ | <spontaneous> | ||
+ | [4] 0.2 2.43 0.00 split(std::string const&, char) [4] | ||
+ | ----------------------------------------------- | ||
+ | <spontaneous> | ||
+ | [5] 0.1 1.62 0.00 print(std::vector<float, std::allocator<float> > const&, int, int) [5] | ||
+ | ----------------------------------------------- | ||
+ | <spontaneous> | ||
+ | [6] 0.1 1.11 0.00 reluPrime(std::vector<float, std::allocator<float> > const&) [6] | ||
+ | ----------------------------------------------- | ||
+ | 0.87 0.00 519195026/519195026 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] | ||
+ | [7] 0.1 0.87 0.00 519195026 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [7] | ||
+ | ----------------------------------------------- | ||
+ | <spontaneous> | ||
+ | [8] 0.0 0.20 0.00 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [8] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&) [30] | ||
+ | [16] 0.0 0.00 0.00 1 transpose(float*, int, int) [16] | ||
+ | ----------------------------------------------- | ||
+ | � | ||
+ | Index by function name | ||
+ | |||
+ | [1] displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) | ||
+ | [2] train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) | ||
+ | [7] std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() | ||
+ | [5] print(std::vector<float, std::allocator<float> > const&, int, int) | ||
+ | [6] reluPrime(std::vector<float, std::allocator<float> > const&) | ||
+ | [8] std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() | ||
+ | [4] split(std::string const&, char) [16] transpose(float*, int, int) | ||
+ | |||
+ | |||
+ | |} | ||
+ | |||
+ | =====Analysis===== | ||
+ | The total execution time of the program is around 3 minutes. The profiling results spot displayPrediction as the function with maximum execution time. However, thats because it displays a matrix using the naive O(n2) for-loop. train() is the next function with the maximum time. This is the hotspot for the program. If this function is made to be the kernel and the functions that it calls as device functions, the program would fasten by a good proportion. | ||
+ | ---- | ||
+ | ====Merge Sort - Edgar Giang==== | ||
+ | |||
+ | =====Intro===== | ||
+ | |||
+ | The topic I chose is sorting algorithms. The algorithm that I'll be profiling and analyzing would be a merge sort algorithm written in c++ which can be found [https://codereview.stackexchange.com/questions/128457/recurrent-merge-sort-without-recursion/ Here]. This code would ask the user to input the size of the array and then fill the array with random elements. The size of the array can be extremely large. From the original creator, the size of the array entered was in the tens of thousands. Theoretically, there would be no limit to how much the user can input however the original creator of this code said it would take the user several hours to sort 10 million elements in the entered array. | ||
+ | |||
+ | |||
+ | =====How to run the program===== | ||
+ | |||
+ | The following command was tested in matrix: | ||
+ | |||
+ | g++ fileName.cpp -pg -o test | ||
+ | |||
+ | ./test | ||
+ | |||
+ | |||
+ | =====Running the program===== | ||
+ | |||
+ | This would be the following output: | ||
+ | |||
+ | Please enter the size of your array: | ||
+ | 200000 | ||
+ | unsorted | ||
+ | sorted | ||
+ | time: 166100 | ||
+ | |||
+ | As you can see the input used was 200,000. The time indicated in the output is in milliseconds so it took 166100 milliseconds to complete. A much larger input would result in the program running for a very long time. | ||
+ | |||
+ | =====Profiling===== | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Flat profile | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | Flat profile: | ||
+ | |||
+ | Each sample counts as 0.01 seconds. | ||
+ | % cumulative self self total | ||
+ | time seconds seconds calls ms/call ms/call name | ||
+ | 21.90 0.07 0.07 600021 0.00 0.00 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) | ||
+ | 15.64 0.12 0.05 200006 0.00 0.00 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) | ||
+ | 9.39 0.15 0.03 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) | ||
+ | 9.39 0.18 0.03 400015 0.00 0.00 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) | ||
+ | 6.26 0.20 0.02 14466088 0.00 0.00 std::vector<int, std::allocator<int> >::operator[](unsigned long) | ||
+ | 6.26 0.22 0.02 600021 0.00 0.00 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) | ||
+ | 6.26 0.24 0.02 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) | ||
+ | 6.26 0.26 0.02 400015 0.00 0.00 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) | ||
+ | 3.13 0.27 0.01 600021 0.00 0.00 std::vector<int, std::allocator<int> >::begin() const | ||
+ | 3.13 0.28 0.01 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) | ||
+ | 3.13 0.29 0.01 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) | ||
+ | 3.13 0.30 0.01 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) | ||
+ | 3.13 0.31 0.01 400015 0.00 0.00 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) | ||
+ | 3.13 0.32 0.01 400015 0.00 0.00 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) | ||
+ | 0.00 0.32 0.00 1200042 0.00 0.00 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) | ||
+ | 0.00 0.32 0.00 1200042 0.00 0.00 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const | ||
+ | 0.00 0.32 0.00 1200042 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 1200042 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 1200042 0.00 0.00 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 1200042 0.00 0.00 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 1200030 0.00 0.00 std::vector<int, std::allocator<int> >::size() const | ||
+ | 0.00 0.32 0.00 1000038 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() | ||
+ | 0.00 0.32 0.00 600021 0.00 0.00 std::vector<int, std::allocator<int> >::end() const | ||
+ | 0.00 0.32 0.00 600018 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) | ||
+ | 0.00 0.32 0.00 400017 0.00 0.00 __gnu_cxx::new_allocator<int>::~new_allocator() | ||
+ | 0.00 0.32 0.00 400017 0.00 0.00 std::allocator<int>::~allocator() | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::max_size() const | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::allocator<int>::allocator(std::allocator<int> const&) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::_Iter_base<int*, false>::_S_base(int*) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::vector<int, std::allocator<int> >::~vector() | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 void std::_Destroy<int*>(int*, int*) | ||
+ | 0.00 0.32 0.00 400016 0.00 0.00 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) | ||
+ | 0.00 0.32 0.00 400015 0.00 0.00 __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) | ||
+ | 0.00 0.32 0.00 400015 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const | ||
+ | 0.00 0.32 0.00 400015 0.00 0.00 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) | ||
+ | 0.00 0.32 0.00 400015 0.00 0.00 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) | ||
+ | 0.00 0.32 0.00 400012 0.00 0.00 unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::capacity() const | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::end() | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::begin() | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | 0.00 0.32 0.00 200006 0.00 0.00 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) | ||
+ | 0.00 0.32 0.00 2 0.00 0.28 check_sort(std::vector<int, std::allocator<int> >) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 _GLOBAL__sub_I_main | ||
+ | 0.00 0.32 0.00 1 0.00 320.11 merge_sort(std::vector<int, std::allocator<int> >) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 __gnu_cxx::new_allocator<int>::new_allocator() | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 std::allocator<int>::allocator() | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) | ||
+ | 0.00 0.32 0.00 1 0.00 0.00 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) | ||
+ | |||
+ | |||
+ | |} | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Call graph | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | Call graph | ||
+ | |||
+ | |||
+ | granularity: each sample hit covers 2 byte(s) for 3.12% of 0.32 seconds | ||
+ | |||
+ | index % time self children called name | ||
+ | <spontaneous> | ||
+ | [1] 100.0 0.00 0.32 main [1] | ||
+ | 0.00 0.32 1/1 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | 0.00 0.00 200000/14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] | ||
+ | 0.00 0.00 1/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] | ||
+ | 0.00 0.00 2/400016 std::vector<int, std::allocator<int> >::~vector() [22] | ||
+ | 0.00 0.00 200001/1200030 std::vector<int, std::allocator<int> >::size() const [39] | ||
+ | 0.00 0.00 1/1 std::allocator<int>::allocator() [70] | ||
+ | 0.00 0.00 1/400017 std::allocator<int>::~allocator() [44] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.32 1/1 main [1] | ||
+ | [2] 99.9 0.00 0.32 1 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | 0.05 0.12 200006/200006 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] | ||
+ | 0.01 0.09 200008/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | 0.00 0.03 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | 0.00 0.02 400014/400016 std::vector<int, std::allocator<int> >::~vector() [22] | ||
+ | 0.00 0.00 2/2 check_sort(std::vector<int, std::allocator<int> >) [25] | ||
+ | 0.00 0.00 400012/400012 unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) [57] | ||
+ | 0.00 0.00 1/1200030 std::vector<int, std::allocator<int> >::size() const [39] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/400015 main [1] | ||
+ | 0.01 0.09 200006/400015 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] | ||
+ | 0.01 0.09 200008/400015 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | [3] 61.5 0.01 0.19 400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | 0.03 0.09 400015/400015 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5] | ||
+ | 0.01 0.05 400015/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] | ||
+ | 0.01 0.00 400015/600021 std::vector<int, std::allocator<int> >::begin() const [23] | ||
+ | 0.00 0.00 400015/400015 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [56] | ||
+ | 0.00 0.00 400015/400015 __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [55] | ||
+ | 0.00 0.00 400015/1200030 std::vector<int, std::allocator<int> >::size() const [39] | ||
+ | 0.00 0.00 400015/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] | ||
+ | 0.00 0.00 400015/600021 std::vector<int, std::allocator<int> >::end() const [41] | ||
+ | ----------------------------------------------- | ||
+ | 0.05 0.12 200006/200006 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | [4] 52.3 0.05 0.12 200006 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] | ||
+ | 0.01 0.09 200006/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | 0.02 0.00 13866088/14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] | ||
+ | ----------------------------------------------- | ||
+ | 0.03 0.09 400015/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | [5] 37.5 0.03 0.09 400015 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5] | ||
+ | 0.02 0.07 400015/400015 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6] | ||
+ | ----------------------------------------------- | ||
+ | 0.02 0.07 400015/400015 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5] | ||
+ | [6] 28.1 0.02 0.07 400015 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6] | ||
+ | 0.00 0.07 400015/400015 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8] | ||
+ | ----------------------------------------------- | ||
+ | 0.01 0.02 200006/600021 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] | ||
+ | 0.01 0.05 400015/600021 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] | ||
+ | [7] 28.1 0.02 0.07 600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] | ||
+ | 0.07 0.00 600021/600021 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [11] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.07 400015/400015 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6] | ||
+ | [8] 21.9 0.00 0.07 400015 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8] | ||
+ | 0.00 0.07 400015/400015 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] | ||
+ | ----------------------------------------------- | ||
+ | 0.01 0.06 400015/400015 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] | ||
+ | [9] 21.9 0.01 0.06 400015 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] | ||
+ | 0.01 0.05 400015/600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] | ||
+ | 0.00 0.00 800030/1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] | ||
+ | 0.00 0.00 400015/400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.07 400015/400015 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8] | ||
+ | [10] 21.9 0.00 0.07 400015 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] | ||
+ | 0.01 0.06 400015/400015 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] | ||
+ | 0.00 0.00 800030/1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] | ||
+ | ----------------------------------------------- | ||
+ | 0.07 0.00 600021/600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] | ||
+ | [11] 21.9 0.07 0.00 600021 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [11] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/400016 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] | ||
+ | 0.01 0.05 400015/400016 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | [12] 18.8 0.01 0.05 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] | ||
+ | 0.03 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13] | ||
+ | 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50] | ||
+ | ----------------------------------------------- | ||
+ | 0.03 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] | ||
+ | [13] 15.6 0.03 0.02 400016 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13] | ||
+ | 0.00 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.03 200006/200006 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | [14] 10.4 0.00 0.03 200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | 0.00 0.03 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] | ||
+ | 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::begin() const [23] | ||
+ | 0.00 0.00 400012/1200030 std::vector<int, std::allocator<int> >::size() const [39] | ||
+ | 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::capacity() const [59] | ||
+ | 0.00 0.00 200006/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] | ||
+ | 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::end() [62] | ||
+ | 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::begin() [63] | ||
+ | 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::end() const [41] | ||
+ | 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.03 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] | ||
+ | [15] 9.4 0.00 0.03 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] | ||
+ | 0.01 0.02 200006/600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] | ||
+ | 0.00 0.00 400012/1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] | ||
+ | 0.00 0.00 200006/200006 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] | ||
+ | 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.03 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | [16] 9.4 0.00 0.03 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] | ||
+ | 0.00 0.03 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] | ||
+ | 0.00 0.00 400012/1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200000/14466088 main [1] | ||
+ | 0.00 0.00 400000/14466088 check_sort(std::vector<int, std::allocator<int> >) [25] | ||
+ | 0.02 0.00 13866088/14466088 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] | ||
+ | [17] 6.2 0.02 0.00 14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] | ||
+ | ----------------------------------------------- | ||
+ | 0.02 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19] | ||
+ | [18] 6.2 0.02 0.00 400016 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18] | ||
+ | 0.00 0.00 400016/400016 __gnu_cxx::new_allocator<int>::max_size() const [46] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13] | ||
+ | [19] 6.2 0.00 0.02 400016 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19] | ||
+ | 0.02 0.00 400016/400016 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18] | ||
+ | ----------------------------------------------- | ||
+ | 0.01 0.01 400016/400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] | ||
+ | [20] 6.2 0.01 0.01 400016 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20] | ||
+ | 0.01 0.00 400016/400016 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [24] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.02 400016/400016 std::vector<int, std::allocator<int> >::~vector() [22] | ||
+ | [21] 6.2 0.00 0.02 400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] | ||
+ | 0.01 0.01 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20] | ||
+ | 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 2/400016 main [1] | ||
+ | 0.00 0.02 400014/400016 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | [22] 6.2 0.00 0.02 400016 std::vector<int, std::allocator<int> >::~vector() [22] | ||
+ | 0.00 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] | ||
+ | 0.00 0.00 400016/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] | ||
+ | 0.00 0.00 400016/400016 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | 0.01 0.00 400015/600021 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | [23] 3.1 0.01 0.00 600021 std::vector<int, std::allocator<int> >::begin() const [23] | ||
+ | 0.00 0.00 600021/1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33] | ||
+ | ----------------------------------------------- | ||
+ | 0.01 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20] | ||
+ | [24] 3.1 0.01 0.00 400016 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [24] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 2/2 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | [25] 0.2 0.00 0.00 2 check_sort(std::vector<int, std::allocator<int> >) [25] | ||
+ | 0.00 0.00 400000/14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] | ||
+ | 0.00 0.00 200001/1200030 std::vector<int, std::allocator<int> >::size() const [39] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 main [1] | ||
+ | [26] 0.0 0.00 0.00 1 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] | ||
+ | 0.00 0.00 1/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] | ||
+ | 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 600021/1200042 std::vector<int, std::allocator<int> >::begin() const [23] | ||
+ | 0.00 0.00 600021/1200042 std::vector<int, std::allocator<int> >::end() const [41] | ||
+ | [33] 0.0 0.00 0.00 1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1200042/1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36] | ||
+ | [34] 0.0 0.00 0.00 1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [34] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1200042/1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] | ||
+ | [35] 0.0 0.00 0.00 1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [35] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1200042/1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] | ||
+ | [36] 0.0 0.00 0.00 1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36] | ||
+ | 0.00 0.00 1200042/1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [34] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400012/1200042 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] | ||
+ | 0.00 0.00 800030/1200042 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] | ||
+ | [37] 0.0 0.00 0.00 1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] | ||
+ | 0.00 0.00 1200042/1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [35] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400012/1200042 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] | ||
+ | 0.00 0.00 800030/1200042 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] | ||
+ | [38] 0.0 0.00 0.00 1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] | ||
+ | 0.00 0.00 1200042/1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1200030 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | 0.00 0.00 200001/1200030 main [1] | ||
+ | 0.00 0.00 200001/1200030 check_sort(std::vector<int, std::allocator<int> >) [25] | ||
+ | 0.00 0.00 400012/1200030 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | 0.00 0.00 400015/1200030 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | [39] 0.0 0.00 0.00 1200030 std::vector<int, std::allocator<int> >::size() const [39] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1000038 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] | ||
+ | 0.00 0.00 200006/1000038 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | 0.00 0.00 400015/1000038 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | 0.00 0.00 400016/1000038 std::vector<int, std::allocator<int> >::~vector() [22] | ||
+ | [40] 0.0 0.00 0.00 1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | 0.00 0.00 400015/600021 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | [41] 0.0 0.00 0.00 600021 std::vector<int, std::allocator<int> >::end() const [41] | ||
+ | 0.00 0.00 600021/1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/600018 std::vector<int, std::allocator<int> >::begin() [63] | ||
+ | 0.00 0.00 200006/600018 std::vector<int, std::allocator<int> >::end() [62] | ||
+ | 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] | ||
+ | [42] 0.0 0.00 0.00 600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400017/400017 std::allocator<int>::~allocator() [44] | ||
+ | [43] 0.0 0.00 0.00 400017 __gnu_cxx::new_allocator<int>::~new_allocator() [43] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/400017 main [1] | ||
+ | 0.00 0.00 400016/400017 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51] | ||
+ | [44] 0.0 0.00 0.00 400017 std::allocator<int>::~allocator() [44] | ||
+ | 0.00 0.00 400017/400017 __gnu_cxx::new_allocator<int>::~new_allocator() [43] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 std::allocator<int>::allocator(std::allocator<int> const&) [47] | ||
+ | [45] 0.0 0.00 0.00 400016 __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [45] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18] | ||
+ | [46] 0.0 0.00 0.00 400016 __gnu_cxx::new_allocator<int>::max_size() const [46] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50] | ||
+ | [47] 0.0 0.00 0.00 400016 std::allocator<int>::allocator(std::allocator<int> const&) [47] | ||
+ | 0.00 0.00 400016/400016 __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [45] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] | ||
+ | [48] 0.0 0.00 0.00 400016 std::_Iter_base<int*, false>::_S_base(int*) [48] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 void std::_Destroy<int*>(int*, int*) [53] | ||
+ | [49] 0.0 0.00 0.00 400016 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [49] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] | ||
+ | [50] 0.0 0.00 0.00 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50] | ||
+ | 0.00 0.00 400016/400016 std::allocator<int>::allocator(std::allocator<int> const&) [47] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] | ||
+ | [51] 0.0 0.00 0.00 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51] | ||
+ | 0.00 0.00 400016/400017 std::allocator<int>::~allocator() [44] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/400016 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] | ||
+ | 0.00 0.00 400015/400016 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] | ||
+ | [52] 0.0 0.00 0.00 400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] | ||
+ | 0.00 0.00 400016/400016 std::_Iter_base<int*, false>::_S_base(int*) [48] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54] | ||
+ | [53] 0.0 0.00 0.00 400016 void std::_Destroy<int*>(int*, int*) [53] | ||
+ | 0.00 0.00 400016/400016 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [49] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400016/400016 std::vector<int, std::allocator<int> >::~vector() [22] | ||
+ | [54] 0.0 0.00 0.00 400016 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54] | ||
+ | 0.00 0.00 400016/400016 void std::_Destroy<int*>(int*, int*) [53] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400015/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | [55] 0.0 0.00 0.00 400015 __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [55] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400015/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] | ||
+ | [56] 0.0 0.00 0.00 400015 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [56] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 400012/400012 merge_sort(std::vector<int, std::allocator<int> >) [2] | ||
+ | [57] 0.0 0.00 0.00 400012 unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) [57] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60] | ||
+ | [58] 0.0 0.00 0.00 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [58] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | [59] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::capacity() const [59] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] | ||
+ | [60] 0.0 0.00 0.00 200006 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60] | ||
+ | 0.00 0.00 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [58] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65] | ||
+ | [61] 0.0 0.00 0.00 200006 void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [61] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | [62] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::end() [62] | ||
+ | 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | [63] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::begin() [63] | ||
+ | 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] | ||
+ | [64] 0.0 0.00 0.00 200006 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] | ||
+ | 0.00 0.00 200006/200006 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66] | ||
+ | [65] 0.0 0.00 0.00 200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65] | ||
+ | 0.00 0.00 200006/200006 void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [61] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] | ||
+ | [66] 0.0 0.00 0.00 200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66] | ||
+ | 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 __libc_csu_init [90] | ||
+ | [67] 0.0 0.00 0.00 1 _GLOBAL__sub_I_main [67] | ||
+ | 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [68] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 _GLOBAL__sub_I_main [67] | ||
+ | [68] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [68] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 std::allocator<int>::allocator() [70] | ||
+ | [69] 0.0 0.00 0.00 1 __gnu_cxx::new_allocator<int>::new_allocator() [69] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 main [1] | ||
+ | [70] 0.0 0.00 0.00 1 std::allocator<int>::allocator() [70] | ||
+ | 0.00 0.00 1/1 __gnu_cxx::new_allocator<int>::new_allocator() [69] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74] | ||
+ | [71] 0.0 0.00 0.00 1 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71] | ||
+ | 0.00 0.00 1/1 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] | ||
+ | [72] 0.0 0.00 0.00 1 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] | ||
+ | 0.00 0.00 1/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] | ||
+ | 0.00 0.00 1/1 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] | ||
+ | [73] 0.0 0.00 0.00 1 __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) [73] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75] | ||
+ | [74] 0.0 0.00 0.00 1 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74] | ||
+ | 0.00 0.00 1/1 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] | ||
+ | [75] 0.0 0.00 0.00 1 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75] | ||
+ | 0.00 0.00 1/1 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74] | ||
+ | ----------------------------------------------- | ||
+ | 0.00 0.00 1/1 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71] | ||
+ | [76] 0.0 0.00 0.00 1 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] | ||
+ | 0.00 0.00 1/400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] | ||
+ | 0.00 0.00 1/1 __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) [73] | ||
+ | ----------------------------------------------- | ||
+ | � | ||
+ | Index by function name | ||
+ | |||
+ | [67] _GLOBAL__sub_I_main [44] std::allocator<int>::~allocator() [22] std::vector<int, std::allocator<int> >::~vector() | ||
+ | [25] check_sort(std::vector<int, std::allocator<int> >) [35] std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [14] std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) | ||
+ | [2] merge_sort(std::vector<int, std::allocator<int> >) [36] std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [17] std::vector<int, std::allocator<int> >::operator[](unsigned long) | ||
+ | [68] __static_initialization_and_destruction_0(int, int) [60] std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [73] __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) | ||
+ | [4] merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [48] std::_Iter_base<int*, false>::_S_base(int*) [37] std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) | ||
+ | [24] __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [11] int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [38] std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) | ||
+ | [18] __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [61] void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | [45] __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [49] void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [52] std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) | ||
+ | [69] __gnu_cxx::new_allocator<int>::new_allocator() [19] std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [7] int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) | ||
+ | [43] __gnu_cxx::new_allocator<int>::~new_allocator() [50] std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [15] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | [55] __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [51] std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [9] int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) | ||
+ | [33] __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [20] std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [6] int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) | ||
+ | [42] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [13] std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [74] void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) | ||
+ | [46] __gnu_cxx::new_allocator<int>::max_size() const [40] std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [5] int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) | ||
+ | [34] __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [12] std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [75] void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) | ||
+ | [58] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [21] std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [57] unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) | ||
+ | [56] std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [8] int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [16] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | [41] std::vector<int, std::allocator<int> >::end() const [71] void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [10] int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) | ||
+ | [39] std::vector<int, std::allocator<int> >::size() const [72] std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [76] int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) | ||
+ | [23] std::vector<int, std::allocator<int> >::begin() const [62] std::vector<int, std::allocator<int> >::end() [65] void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) | ||
+ | [59] std::vector<int, std::allocator<int> >::capacity() const [63] std::vector<int, std::allocator<int> >::begin() [66] void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) | ||
+ | [47] std::allocator<int>::allocator(std::allocator<int> const&) [3] std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [53] void std::_Destroy<int*>(int*, int*) | ||
+ | [70] std::allocator<int>::allocator() [26] std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [54] void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) | ||
+ | |||
+ | |||
+ | |} | ||
+ | |||
+ | |||
+ | =====Analysis===== | ||
+ | |||
+ | 21.90 0.07 0.07 600021 0.00 0.00 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) | ||
+ | 15.64 0.12 0.05 200006 0.00 0.00 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) | ||
+ | |||
+ | From the flat profile,the hotspots for this program would be the function merge function whih was called 200,006 times and thee int* std::__copy__move. The merge function would take up 52.3% of the time | ||
+ | |||
+ | ---- | ||
=== Assignment 2 === | === Assignment 2 === | ||
+ | Our initial idea was to use the neural network code for our assignment 2. But since the algorithm itself was not very accurate (2/10 correct predictions even after 10,000 training iterations), we decided to paralellize merge sort. Soon we realized that since its Big O classification was n log n, offloading computations to GPU would not be that effective. So, we settled with the cosine transform library, as described below. | ||
+ | |||
+ | ====Cosine Tranformation==== | ||
+ | |||
+ | The Cosine_Transform is a simple C++ library which demonstrates properties of the Discrete cosine Transform for real data. The Discrete Cosine Transform or DCT is used to create jpeg (compressed images). | ||
+ | |||
+ | The formula used here is: | ||
+ | |||
+ | | (√1/n) , if u=0; 0≤v≤n-1 | ||
+ | C(u,v) = | ||
+ | | (√2/n) * cos[((2*v+1)π*u)/2n], if 1≤u≤n-1; 0≤v≤n-1 | ||
+ | |||
+ | Where, u is the row index, v is the column index and n is the total number of elements in a row/column in the computational matrix. | ||
+ | |||
+ | This [https://www.youtube.com/watch?v=tW3Hc0Wrgl0 Link] can be used for better understanding of the above formula. Here is the [https://people.sc.fsu.edu/~jburkardt/cpp_src/cosine_transform/cosine_transform.html source code] used. | ||
+ | |||
+ | =====Profiling===== | ||
+ | The flat profile for the above serial code looks like: | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Flat Profile | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | |||
+ | 1 | ||
+ | 2 | ||
+ | 3 | ||
+ | 4 granularity: each sample hit covers 2 byte(s) for 0.68% of 1.47 seconds | ||
+ | 5 | ||
+ | 6 index % time self children called name | ||
+ | 7 <spontaneous> | ||
+ | 8 [1] 100.0 0.00 1.47 main [1] | ||
+ | 9 0.00 1.47 1/1 cosine_transform_test01(int) [3] | ||
+ | 10 ----------------------------------------------- | ||
+ | 11 1.47 0.00 1/1 cosine_transform_test01(int) [3] | ||
+ | 12 [2] 100.0 1.47 0.00 1 cosine_transform_data(int, double*) [2] | ||
+ | 13 ----------------------------------------------- | ||
+ | 14 0.00 1.47 1/1 main [1] | ||
+ | 15 [3] 100.0 0.00 1.47 1 cosine_transform_test01(int) [3] | ||
+ | 16 1.47 0.00 1/1 cosine_transform_data(int, double*) [2] | ||
+ | 17 0.00 0.00 1/1 r8vec_uniform_01_new(int, int&) [14] | ||
+ | 18 0.00 0.00 1/1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] | ||
+ | 19 0.00 0.00 1/1 std::common_type<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::type std::chrono::operator-<std::chrono::_V2::s teady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 10 00000000l> > > const&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) [21] | ||
+ | 20 ----------------------------------------------- | ||
+ | 21 0.00 0.00 1/3 std::chrono::duration<long, std::ratio<1l, 1000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::ratio<1l, 1000000l>, long, true, false>: :__cast<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [18] | ||
+ | 22 0.00 0.00 2/3 std::common_type<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::type std::chrono::operator-<long, std::ratio<1l , 1000000000l>, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&, std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [22] | ||
+ | 23 [10] 0.0 0.00 0.00 3 std::chrono::duration<long, std::ratio<1l, 1000000000l> >::count() const [10] | ||
+ | 24 ----------------------------------------------- | ||
+ | 25 0.00 0.00 2/2 std::common_type<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::type std::chrono::operator-<std::chrono::_V2::s teady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 10 00000000l> > > const&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) [21] | ||
+ | 26 [11] 0.0 0.00 0.00 2 std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::time_since_epoch() const [11] | ||
+ | 27 ----------------------------------------------- | ||
+ | 28 0.00 0.00 1/1 __libc_csu_init [28] | ||
+ | 29 [12] 0.0 0.00 0.00 1 _GLOBAL__sub_I__Z20r8vec_uniform_01_newiRi [12] | ||
+ | 30 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [15] | ||
+ | 31 ----------------------------------------------- | ||
+ | 32 0.00 0.00 1/1 cosine_transform_test01(int) [3] | ||
+ | 33 [13] 0.0 0.00 0.00 1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] | ||
+ | 34 0.00 0.00 1/1 std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000l> > >::type std::chrono::duratio n_cast<std::chrono::duration<long, std::ratio<1l, 1000l> >, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [17] | ||
+ | 35 0.00 0.00 1/1 std::chrono::duration<long, std::ratio<1l, 1000l> >::count() const [16] | ||
+ | 36 ----------------------------------------------- | ||
+ | 37 0.00 0.00 1/1 cosine_transform_test01(int) [3] | ||
+ | 38 [14] 0.0 0.00 0.00 1 r8vec_uniform_01_new(int, int&) [14] | ||
+ | 39 ----------------------------------------------- | ||
+ | 40 0.00 0.00 1/1 _GLOBAL__sub_I__Z20r8vec_uniform_01_newiRi [12] | ||
+ | 41 [15] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [15] | ||
+ | 42 ----------------------------------------------- | ||
+ | 43 0.00 0.00 1/1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] | ||
+ | 44 [16] 0.0 0.00 0.00 1 std::chrono::duration<long, std::ratio<1l, 1000l> >::count() const [16] | ||
+ | 45 ----------------------------------------------- | ||
+ | 46 0.00 0.00 1/1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] | ||
+ | 47 [17] 0.0 0.00 0.00 1 std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000l> > >::type std::chrono::duration_ca st<std::chrono::duration<long, std::ratio<1l, 1000l> >, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [17] | ||
+ | 48 0.00 0.00 1/1 std::chrono::duration<long, std::ratio<1l, 1000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::ratio<1l, 1000000l>, long, true, false>: :__cast<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [18] | ||
+ | 49 ----------------------------------------------- | ||
+ | 50 0.00 0.00 1/1 std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000l> > >::type std::chrono::duratio n_cast<std::chrono::duration<long, std::ratio<1l, 1000l> >, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [17] | ||
+ | 51 [18] 0.0 0.00 0.00 1 std::chrono::duration<long, std::ratio<1l, 1000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::ratio<1l, 1000000l>, long, true, false>::__c ast<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [18] | ||
+ | 52 0.00 0.00 1/3 std::chrono::duration<long, std::ratio<1l, 1000000000l> >::count() const [10] | ||
+ | |||
+ | |} | ||
+ | |||
+ | As is evident, the algorithm is O(n2) currently. Using thread indices on the GPU to replace the for loops could potentially improve performance. | ||
+ | To increase the efficiency of the program we transformed the '''cosine_transform_data''' function into a kernel named '''cosTransformKernel''' which offloads the compute intense calculation of the program to the GPU. | ||
+ | |||
+ | |||
+ | =====Kernel Version 1===== | ||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Modified Code | ||
+ | |- | ||
+ | | | ||
+ | |||
+ | # include <iostream> | ||
+ | # include <iomanip> | ||
+ | # include <ctime> | ||
+ | # include <chrono> | ||
+ | # include <cstdlib> | ||
+ | # include <cmath> | ||
+ | #include <cuda_runtime.h> | ||
+ | using namespace std; | ||
+ | using namespace std::chrono; | ||
+ | const double pi = 3.141592653589793; | ||
+ | const int ntpb = 1024; | ||
+ | void cosine_transform_test01 ( int size ); | ||
+ | |||
+ | double *r8vec_uniform_01_new ( int n, int &seed ){ | ||
+ | int i; | ||
+ | const int i4_huge = 2147483647; | ||
+ | int k; | ||
+ | double *r; | ||
+ | if ( seed == 0 ){ | ||
+ | cerr << "\n"; | ||
+ | cerr << "R8VEC_UNIFORM_01_NEW - Fatal error!\n"; | ||
+ | cerr << " Input value of SEED = 0.\n"; | ||
+ | exit ( 1 ); | ||
+ | } | ||
+ | r = new double[n]; | ||
+ | for ( i = 0; i < n; i++ ){ | ||
+ | k = seed / 127773; | ||
+ | seed = 16807 * ( seed - k * 127773 ) - k * 2836; | ||
+ | if ( seed < 0 ){ | ||
+ | seed = seed + i4_huge; | ||
+ | } | ||
+ | r[i] = ( double ) ( seed ) * 4.656612875E-10; | ||
+ | } | ||
+ | return r; | ||
+ | } | ||
+ | |||
+ | double *cosine_transform_data ( int n, double d[] ){ | ||
+ | double angle; | ||
+ | double *c; | ||
+ | int i; | ||
+ | int j; | ||
+ | c = new double[n]; | ||
+ | for ( i = 0; i < n; i++ ){ | ||
+ | c[i] = 0.0; | ||
+ | for ( j = 0; j < n; j++ ){ | ||
+ | angle = pi * ( double ) ( i * ( 2 * j + 1 ) ) / ( double ) ( 2 * n ); | ||
+ | c[i] = c[i] + cos ( angle ) * d[j]; | ||
+ | } | ||
+ | c[i] = c[i] * sqrt ( 2.0 / ( double ) ( n ) ); | ||
+ | } | ||
+ | return c; | ||
+ | } | ||
+ | |||
+ | void reportTime(const char* msg, steady_clock::duration span) { | ||
+ | auto ms = duration_cast<milliseconds>(span); | ||
+ | std::cout << msg << " - took - " << | ||
+ | ms.count() << " millisecs" << std::endl; | ||
+ | } | ||
+ | |||
+ | __global__ void cosTransformKernel(double *a, double *b, int n){ | ||
+ | double angle; | ||
+ | const double pi = 3.141592653589793; | ||
+ | int i = blockIdx.x * blockDim.x + threadIdx.x; | ||
+ | for(int j=0; j<n; j++){ | ||
+ | angle = pi * (double) (i*(2*j+1)) / (double)(2*n); | ||
+ | b[i] += cos ( angle ) * a[j]; | ||
+ | } | ||
+ | b[i] *= sqrt( 2.0 / (double) n ); | ||
+ | } | ||
+ | |||
+ | int main (int argc, char* argv[] ){ | ||
+ | if (argc != 2) { | ||
+ | std::cerr << argv[0] << ": invalid number of arguments\n"; | ||
+ | std::cerr << "Usage: " << argv[0] << " size_of_vector\n"; | ||
+ | return 1; | ||
+ | } | ||
+ | int n = std::atoi(argv[1]); | ||
+ | cosine_transform_test01 (n); | ||
+ | return 0; | ||
+ | } | ||
+ | |||
+ | void cosine_transform_test01 ( int size){ | ||
+ | int n = size; | ||
+ | int seed; | ||
+ | double *r; | ||
+ | double *hs; | ||
+ | double *s = new double[n]; | ||
+ | double *d_a; | ||
+ | double *d_b; | ||
+ | //allocate memory on the device for the randomly generated array and for the array in which transform values will be stored | ||
+ | cudaMalloc((void**)&d_a,sizeof(double) * n); | ||
+ | cudaMalloc((void**)&d_b,sizeof(double) * n); | ||
+ | seed = 123456789; | ||
+ | |||
+ | r = r8vec_uniform_01_new ( n, seed ); | ||
+ | |||
+ | //copy randomly generated values from host to device | ||
+ | cudaMemcpy(d_a,r,sizeof(double)*n,cudaMemcpyHostToDevice); | ||
+ | int nblks = (n + ntpb - 1) / ntpb; | ||
+ | steady_clock::time_point ts, te; | ||
+ | ts = steady_clock::now(); | ||
+ | cosTransformKernel<<<nblks,ntpb>>>(d_a,d_b,size); | ||
+ | cudaDeviceSynchronize(); | ||
+ | te = steady_clock::now(); | ||
+ | reportTime("Cosine Transform on device",te-ts); | ||
+ | cudaMemcpy(s,d_b,sizeof(double)*n,cudaMemcpyDeviceToHost); | ||
+ | |||
+ | ts = steady_clock::now(); | ||
+ | hs = cosine_transform_data ( n, r ); | ||
+ | te = steady_clock::now(); | ||
+ | reportTime("Cosine Transform on host",te-ts); | ||
+ | |||
+ | cudaFree(d_a); | ||
+ | cudaFree(d_b); | ||
+ | delete [] r; | ||
+ | delete [] s; | ||
+ | delete [] hs; | ||
+ | } | ||
+ | |||
+ | |} | ||
+ | |||
+ | |||
+ | |||
+ | The graph for the execution time difference between the device and the host looks like: | ||
+ | |||
+ | [[File:kernel1.png]] | ||
+ | |||
+ | Even though the kernel includes a for-loop the execution time has decreased drastically. Thats because each thread is now responsible for one calculating one element of the final Cos transformed matrix(unit vector). | ||
+ | |||
=== Assignment 3 === | === Assignment 3 === | ||
+ | |||
+ | For optimizing the code better, we thought of removing the iterative loop from the kernel by using threadIdx.y to control calculation of each element's cosine for that position in the supposed matrix. The problem in this was that each thread was in a racing condition to write to the same memory location, to sum up the cosine transformations for all elements of that row. We solved this by using the atomic function. Its prototype is as follows. | ||
+ | double atomicAdd(double* address, double value) | ||
+ | |||
+ | =====Kernel Version 2===== | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | ! Kernel 2 | ||
+ | |- | ||
+ | | | ||
+ | # include <cmath> | ||
+ | # include <cstdlib> | ||
+ | # include <iostream> | ||
+ | # include <iomanip> | ||
+ | # include <ctime> | ||
+ | # include <chrono> | ||
+ | # include <cstdlib> | ||
+ | # include <cmath> | ||
+ | #include <limits> | ||
+ | #include <cuda_runtime.h> | ||
+ | #include <cuda.h> | ||
+ | using namespace std; | ||
+ | using namespace std::chrono; | ||
+ | const double pi = 3.141592653589793; | ||
+ | const unsigned ntpb = 32; | ||
+ | void cosine_transform_test01 ( int size ); | ||
+ | |||
+ | double *r8vec_uniform_01_new ( int n, int &seed ){ | ||
+ | int i; | ||
+ | const int i4_huge = 2147483647; | ||
+ | int k; | ||
+ | double *r; | ||
+ | if ( seed == 0 ){ | ||
+ | cerr << "\n"; | ||
+ | cerr << "R8VEC_UNIFORM_01_NEW - Fatal error!\n"; | ||
+ | cerr << " Input value of SEED = 0.\n"; | ||
+ | exit ( 1 ); | ||
+ | } | ||
+ | r = new double[n]; | ||
+ | for ( i = 0; i < n; i++ ){ | ||
+ | k = seed / 127773; | ||
+ | seed = 16807 * ( seed - k * 127773 ) - k * 2836; | ||
+ | if ( seed < 0 ){ | ||
+ | seed = seed + i4_huge; | ||
+ | } | ||
+ | r[i] = ( double ) ( seed ) * 4.656612875E-10; | ||
+ | } | ||
+ | return r; | ||
+ | } | ||
+ | |||
+ | double *cosine_transform_data ( int n, double d[] ){ | ||
+ | double angle; | ||
+ | double *c; | ||
+ | int i; | ||
+ | int j; | ||
+ | c = new double[n]; | ||
+ | for ( i = 0; i < n; i++ ){ | ||
+ | c[i] = 0.0; | ||
+ | for ( j = 0; j < n; j++ ){ | ||
+ | angle = pi * ( double ) ( i * ( 2 * j + 1 ) ) / ( double ) ( 2 * n ); | ||
+ | c[i] = c[i] + cos ( angle ) * d[j]; | ||
+ | } | ||
+ | c[i] = c[i] * sqrt ( 2.0 / ( double ) ( n ) ); | ||
+ | } | ||
+ | return c; | ||
+ | } | ||
+ | |||
+ | |||
+ | void reportTime(const char* msg, steady_clock::duration span) { | ||
+ | auto ms = duration_cast<milliseconds>(span); | ||
+ | std::cout << msg << " - took - " << | ||
+ | ms.count() << " millisecs" << std::endl; | ||
+ | } | ||
+ | |||
+ | __global__ void cosTransformKernel(double *a, double *b, const int n){ | ||
+ | double angle; | ||
+ | const double pi = 3.141592653589793; | ||
+ | int j = blockIdx.x * blockDim.x + threadIdx.x; | ||
+ | int i = blockIdx.y * blockDim.y + threadIdx.y; | ||
+ | if(i<n && j<n){ | ||
+ | angle = pi * ( double ) ( i * ( 2 * j + 1 ) ) / ( double ) ( 2 * n ); | ||
+ | double value = cos ( angle ) * a[j]; | ||
+ | b[i] = atomicAdd(&b[i], value); | ||
+ | } | ||
+ | //square root of the whole cos transformed row term | ||
+ | if(j==n-1 && i<n){ | ||
+ | b[i] *= sqrt ( 2.0 / ( double ) ( n ) ); | ||
+ | } | ||
+ | } | ||
+ | |||
+ | int main (int argc, char* argv[] ){ | ||
+ | if (argc != 2) { | ||
+ | std::cerr << argv[0] << ": invalid number of arguments\n"; | ||
+ | std::cerr << "Usage: " << argv[0] << " size_of_vector\n"; | ||
+ | return 1; | ||
+ | } | ||
+ | int n = std::atoi(argv[1]); | ||
+ | cosine_transform_test01 (n); | ||
+ | return 0; | ||
+ | } | ||
+ | |||
+ | void cosine_transform_test01 ( int size){ | ||
+ | int n = size; | ||
+ | int seed; | ||
+ | double *r; | ||
+ | double *hs; //host side pointer to store the array returned from host side cosine_transform_data, for comparison purposes | ||
+ | double *s = new double[n]; | ||
+ | //double *t; | ||
+ | double *d_a; | ||
+ | double *d_b; | ||
+ | //allocate memory on the device for the randomly generated array and for the array in which transform values will be stored | ||
+ | cudaMalloc((void**)&d_a,sizeof(double) * n); | ||
+ | cudaMalloc((void**)&d_b,sizeof(double) * n); | ||
+ | seed = 123456789; | ||
+ | |||
+ | r = r8vec_uniform_01_new ( n, seed ); | ||
+ | //copy randomly generated values from host to device | ||
+ | for(int i=0; i<n; i++) | ||
+ | s[i]=0.0; | ||
+ | cudaMemcpy(d_a,r,sizeof(double)*n,cudaMemcpyHostToDevice); | ||
+ | cudaMemcpy(d_b,s,sizeof(double)*n,cudaMemcpyHostToDevice); | ||
+ | int nblks = (n + ntpb - 1) / ntpb; | ||
+ | dim3 grid(nblks,nblks,1); | ||
+ | dim3 block(ntpb,ntpb,1); | ||
+ | steady_clock::time_point ts, te; | ||
+ | ts = steady_clock::now(); | ||
+ | |||
+ | cosTransformKernel<<<grid,block>>>(d_a,d_b,size); | ||
+ | cudaDeviceSynchronize(); | ||
+ | |||
+ | te = steady_clock::now(); | ||
+ | reportTime("Cosine Transform on device",te-ts); | ||
+ | cudaMemcpy(s,d_b,sizeof(double)*n,cudaMemcpyDeviceToHost); | ||
+ | |||
+ | ts = steady_clock::now(); | ||
+ | hs = cosine_transform_data ( n, r ); | ||
+ | te = steady_clock::now(); | ||
+ | reportTime("Cosine Transform on host",te-ts); | ||
+ | |||
+ | cudaFree(d_a); | ||
+ | cudaFree(d_b); | ||
+ | delete [] r; | ||
+ | delete [] s; | ||
+ | delete [] hs; | ||
+ | //delete [] t; | ||
+ | |||
+ | return; | ||
+ | } | ||
+ | |||
+ | |} | ||
+ | |||
+ | Here is a comparison between the naive and optimized kernel | ||
+ | |||
+ | [[File:kernel2.jpg]] | ||
+ | |||
+ | Evidently, there is some performance boost for the new version. However, each call to atomicAdd by a thread locks the global memory until the old value is read and added to the passed value. This deters faster execution as might be expected. |
Latest revision as of 02:00, 8 April 2019
Algo Holics
Team Members
- Sukhbeer Dhillon, Simple Neural Network
- Gurpreet Singh, Sudoku Puzzle Solver
- Edgar Giang, Merge sort
- Email All
Progress
Assignment 1
Sudoku Puzzle Solver - Gurpreet Singh
Is it a program that solves Sudoku puzzles(9X9) using Bruteforce algorithm. The user can either pass a Sudoku files as an input or enter the values manually. Moreover, the file or the manual entry must strictly have 9 rows and 9 columns in them. Last but not the least, all the cells must be separated by a space and the cells that needs to be solved must have 0 in them as their value.
The original source code can be found at Link
Logic
In this program the Bruteforce algorithm first put 1 in the first cell. Then it moves to the second cell and put 1 in there and check if it satisfies all the rules and conditions. If it don't, then the algorithm will increment it's value to 2 and then check again. The value can change from 0-9 to find the correct value for a cell. If none of the value from the range of 0-9 satisfies the cell, then the program will iterate back and change the value of the first cell to 2 and then try the whole process again. In this way it will solve the puzzle.
Compiling the program
Enter the following commands:
g++ -std=c++0x -pg solver.cpp checks.cpp checksolution.cpp -o a a fileName
-pg directs the compiler to include the executable code required for profiling.
-o directs the compiler to name the executable a.
If we run the sample-puzzle-1 (level- easy) file, which has the following text inside it:
0 6 0 0 0 0 9 7 2 0 5 0 0 0 2 0 0 3 0 7 0 3 9 0 5 0 0 2 0 0 0 0 5 4 0 8 0 0 0 0 0 0 0 0 0 3 0 1 8 0 0 0 0 6 0 0 4 0 2 3 0 8 0 7 0 0 9 0 0 0 2 0 9 2 5 0 0 0 0 4 0
The output will be:
1 6 3 4 5 8 9 7 2 4 5 9 7 1 2 8 6 3 8 7 2 3 9 6 5 1 4 2 9 7 1 6 5 4 3 8 5 8 6 2 3 4 1 9 7 3 4 1 8 7 9 2 5 6 6 1 4 5 2 3 7 8 9 7 3 8 9 4 1 6 2 5 9 2 5 6 8 7 3 4 1
Analysis
To analyze the call graph, enter the following command:
gprof -q -b a> a.clg
-q directs the profiler (gprof) to output a call graph.
-b directs the profiler to omit detailed explanations of the column headings from the output.
The call graph for the above execution looks like:
Call Graph |
---|
Call graph granularity: each sample hit covers 2 byte(s) no time propagated index % time self children called name 0.00 0.00 4539/4539 placeNum(int, int) [10] [8] 0.0 0.00 0.00 4539 checkRow(int, int) [8] ----------------------------------------------- 0.00 0.00 1620/1620 placeNum(int, int) [10] [9] 0.0 0.00 0.00 1620 checkColumn(int, int) [9] ----------------------------------------------- 0.00 0.00 1120/1120 solveSudoku() [16] [10] 0.0 0.00 0.00 1120 placeNum(int, int) [10] 0.00 0.00 4539/4539 checkRow(int, int) [8] 0.00 0.00 1620/1620 checkColumn(int, int) [9] 0.00 0.00 698/698 checkSquare(int, int, int) [11] ----------------------------------------------- 0.00 0.00 698/698 placeNum(int, int) [10] [11] 0.0 0.00 0.00 698 checkSquare(int, int, int) [11] ----------------------------------------------- 0.00 0.00 476/476 solveSudoku() [16] [12] 0.0 0.00 0.00 476 goBack(int&, int&) [12] ----------------------------------------------- 0.00 0.00 2/2 main [6] [13] 0.0 0.00 0.00 2 print(int (*) [9]) [13] ----------------------------------------------- 0.00 0.00 1/1 __libc_csu_init [30] [14] 0.0 0.00 0.00 1 _GLOBAL__sub_I_sudoku [14] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [18] ----------------------------------------------- 0.00 0.00 1/1 __libc_csu_init [30] [15] 0.0 0.00 0.00 1 _GLOBAL__sub_I_temp [15] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [19] ----------------------------------------------- 0.00 0.00 1/1 main [6] [16] 0.0 0.00 0.00 1 solveSudoku() [16] 0.00 0.00 1120/1120 placeNum(int, int) [10] 0.00 0.00 476/476 goBack(int&, int&) [12] ----------------------------------------------- 0.00 0.00 1/1 main [6] [17] 0.0 0.00 0.00 1 storePositions() [17] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_sudoku [14] [18] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [18] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_temp [15] [19] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [19] ----------------------------------------------- Index by function name [14] _GLOBAL__sub_I_sudoku [16] solveSudoku() [13] print(int (*) [9]) [15] _GLOBAL__sub_I_temp [17] storePositions() [12] goBack(int&, int&) [9] checkColumn(int, int) [18] __static_initialization_and_destruction_0(int, int) [8] checkRow(int, int) [11] checkSquare(int, int, int) [19] __static_initialization_and_destruction_0(int, int) [10] placeNum(int, int) |
From the above Call graph we can see that the program took no time in finding the solution and the maximum number of calls were made to the checkRow, checkColumn and checkSquare function. However, to get a better understanding of the program let's try a harder Sudoku puzzle.
If we run the sample-puzzle-2-hard (Level- hard) file, which has the following text inside it:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 8 5 0 0 1 0 2 0 0 0 0 0 0 0 5 0 7 0 0 0 0 0 4 0 0 0 1 0 0 0 9 0 0 0 0 0 0 0 5 0 0 0 0 0 0 7 3 0 0 2 0 1 0 0 0 0 0 0 0 0 4 0 0 0 9
The output will be:
9 8 7 6 5 4 3 2 1 2 4 6 1 7 3 9 8 5 3 5 1 9 2 8 7 4 6 1 2 8 5 3 7 6 9 4 6 3 4 8 9 2 1 5 7 7 9 5 4 6 1 8 3 2 5 1 9 2 8 6 4 7 3 4 7 2 3 1 9 5 6 8 8 6 3 7 4 5 2 1 9
The Call graph for the following looks like:
Call Graph |
---|
Call graph granularity: each sample hit covers 2 byte(s) for 0.04% of 26.79 seconds index % time self children called name <spontaneous> [1] 100.0 0.00 26.78 main [1] 0.68 26.09 1/1 solveSudoku() [2] 0.01 0.00 1/1 storePositions() [9] 0.00 0.00 2/2 print(int (*) [9]) [17] ----------------------------------------------- 0.68 26.09 1/1 main [1] [2] 99.9 0.68 26.09 1 solveSudoku() [2] 3.64 21.56 157353814/157353814 placeNum(int, int) [3] 0.89 0.00 69175252/69175252 goBack(int&, int&) [7] ----------------------------------------------- 3.64 21.56 157353814/157353814 solveSudoku() [2] [3] 94.1 3.64 21.56 157353814 placeNum(int, int) [3] 13.31 0.00 622577597/622577597 checkRow(int, int) [4] 5.04 0.00 223365661/223365661 checkColumn(int, int) [5] 3.21 0.00 100608583/100608583 checkSquare(int, int, int) [6] ----------------------------------------------- 13.31 0.00 622577597/622577597 placeNum(int, int) [3] [4] 49.7 13.31 0.00 622577597 checkRow(int, int) [4] ----------------------------------------------- 5.04 0.00 223365661/223365661 placeNum(int, int) [3] [5] 18.8 5.04 0.00 223365661 checkColumn(int, int) [5] ----------------------------------------------- 3.21 0.00 100608583/100608583 placeNum(int, int) [3] [6] 12.0 3.21 0.00 100608583 checkSquare(int, int, int) [6] ----------------------------------------------- 0.89 0.00 69175252/69175252 solveSudoku() [2] [7] 3.3 0.89 0.00 69175252 goBack(int&, int&) [7] ----------------------------------------------- 0.01 0.00 1/1 __libc_csu_init [10] [8] 0.0 0.01 0.00 1 _GLOBAL__sub_I_sudoku [8] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [19] ----------------------------------------------- 0.01 0.00 1/1 main [1] [9] 0.0 0.01 0.00 1 storePositions() [9] ----------------------------------------------- <spontaneous> [10] 0.0 0.00 0.01 __libc_csu_init [10] 0.01 0.00 1/1 _GLOBAL__sub_I_sudoku [8] 0.00 0.00 1/1 _GLOBAL__sub_I_temp [18] ----------------------------------------------- 0.00 0.00 2/2 main [1] [17] 0.0 0.00 0.00 2 print(int (*) [9]) [17] ----------------------------------------------- 0.00 0.00 1/1 __libc_csu_init [10] [18] 0.0 0.00 0.00 1 _GLOBAL__sub_I_temp [18] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [20] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_sudoku [8] [19] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [19] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_temp [18] [20] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [20] ----------------------------------------------- Index by function name [8] _GLOBAL__sub_I_sudoku [2] solveSudoku() [17] print(int (*) [9]) [18] _GLOBAL__sub_I_temp [9] storePositions() [7] goBack(int&, int&) [5] checkColumn(int, int) [19] __static_initialization_and_destruction_0(int, int) [4] checkRow(int, int) [6] checkSquare(int, int, int) [20] __static_initialization_and_destruction_0(int, int) [3] placeNum(int, int)
|
From the above Call graph we can see that for a harder Sudoku puzzle, the time increased significantly. Moreover, it can also be seen that almost 50% of the time is consumed by the checkRow function, 18.8% by checkColumn and finally 12% by the checkSquare function. Thousand of calls were made to these 3 functions, if we parallelizing these functions then the efficiency of the program can be increased significantly.
Simple Artificial Neural Network - Sukhbeer Singh
Introduction
I am very interested in neural networks and I started learning about them recently. I think this is a good opportunity to build on my knowledge of a NN while also parallelising it. For that purpose, I have selected a very basic neural network which feeds forward with ReLu and softmax and back-propagates on a sample batch from MNIST handwritten digits dataset. In each iteration, the weights are adjusted to train the network for better predictions. The code performs matrix-multiplication(dot-product) each time that the activation vector and delta vector are calculated for the next node and the previous node respectively.
Source Code
Here is the source code used.
I changed the source code to put the network's training in a separate function called train. Also, I used cblas library to do matrix-matrix multiplication inside dot function, instead of using three nested for loops.
nn.cpp |
---|
// To compile: g++ -o nn nn.cpp -std=c++11 -lgslcblas // To run: ./nn // Created by Sergei Bugrov on 4/20/18. // Copyright © 2017 Sergei Bugrov. All rights reserved. // Download dataset from: https://drive.google.com/file/d/1tVyvg6c1Eo5ojtiz0R17YEzcUe5cN285/view // Updated By - Sukhbeer Singh Dhillon, March 23rd 2019 #include <iostream> #include <iomanip> #include <cstdlib> #include <vector> #include <math.h> #include <fstream> #include <sstream> #include <string> #include <random> #include <algorithm> extern "C"{ #include<gsl/gsl_cblas.h> } using namespace std; vector<float> print ( const vector <float>& m, int n_rows, int n_columns ) { /* "Couts" the input vector as n_rows x n_columns matrix. Inputs: m: vector, matrix of size n_rows x n_columns n_rows: int, number of rows in the left matrix m1 n_columns: int, number of columns in the left matrix m1 */ vector<float> outputDigits; for( int i = 0; i != n_rows; ++i ) { float digitPredicted = 0.0; int index = 0; for( int j = 0; j != n_columns; ++j ) { float currentValue = m[i * n_columns + j]; cout << currentValue << " "; if(currentValue > digitPredicted){ digitPredicted = currentValue; index = j; } } outputDigits.push_back(index); cout << " --> Digit = " << index <<"\n"; } cout << endl; return outputDigits; } int argmax ( const vector <float>& m ) { return distance(m.begin(), max_element(m.begin(), m.end())); } vector <float> relu(const vector <float>& z){ int size = z.size(); vector <float> output; for( int i = 0; i < size; ++i ) { if (z[i] < 0){ output.push_back(0.0); } else output.push_back(z[i]); } return output; } vector <float> reluPrime (const vector <float>& z) { int size = z.size(); vector <float> output; for( int i = 0; i < size; ++i ) { if (z[i] <= 0){ output.push_back(0.0); } else output.push_back(1.0); } return output; } static vector<float> random_vector(const int size){ random_device rd; mt19937 gen(rd()); uniform_real_distribution<> distribution(0.0, 0.05); static default_random_engine generator; vector<float> data(size); generate(data.begin(), data.end(), [&]() { return distribution(generator); }); return data; } vector <float> softmax (const vector <float>& z, const int dim) { const int zsize = static_cast<int>(z.size()); vector <float> out; for (unsigned i = 0; i != zsize; i += dim) { vector <float> foo; for (unsigned j = 0; j != dim; ++j) { foo.push_back(z[i + j]); } float max_foo = *max_element(foo.begin(), foo.end()); for (unsigned j = 0; j != dim; ++j) { foo[j] = exp(foo[j] - max_foo); } float sum_of_elems = 0.0; for (unsigned j = 0; j != dim; ++j) { sum_of_elems = sum_of_elems + foo[j]; } for (unsigned j = 0; j != dim; ++j) { out.push_back(foo[j]/sum_of_elems); } } return out; } vector <float> sigmoid_d (const vector <float>& m1) { /* Returns the value of the sigmoid function derivative f'(x) = f(x)(1 - f(x)), where f(x) is sigmoid function. Input: m1, a vector. Output: x(1 - x) for every element of the input matrix m1. */ const unsigned long VECTOR_SIZE = m1.size(); vector <float> output (VECTOR_SIZE); for( unsigned i = 0; i != VECTOR_SIZE; ++i ) { output[ i ] = m1[ i ] * (1 - m1[ i ]); } return output; } vector <float> sigmoid (const vector <float>& m1) { /* Returns the value of the sigmoid function f(x) = 1/(1 + e^-x). Input: m1, a vector. Output: 1/(1 + e^-x) for every element of the input matrix m1. */ const unsigned long VECTOR_SIZE = m1.size(); vector <float> output (VECTOR_SIZE); for( unsigned i = 0; i != VECTOR_SIZE; ++i ) { output[ i ] = 1 / (1 + exp(-m1[ i ])); } return output; } vector <float> operator+(const vector <float>& m1, const vector <float>& m2){ /* Returns the elementwise sum of two vectors. Inputs: m1: a vector m2: a vector Output: a vector, sum of the vectors m1 and m2. */ const unsigned long VECTOR_SIZE = m1.size(); vector <float> sum (VECTOR_SIZE); for (unsigned i = 0; i != VECTOR_SIZE; ++i){ sum[i] = m1[i] + m2[i]; }; return sum; } vector <float> operator-(const vector <float>& m1, const vector <float>& m2){ /* Returns the difference between two vectors. Inputs: m1: vector m2: vector Output: vector, m1 - m2, difference between two vectors m1 and m2. */ const unsigned long VECTOR_SIZE = m1.size(); vector <float> difference (VECTOR_SIZE); for (unsigned i = 0; i != VECTOR_SIZE; ++i){ difference[i] = m1[i] - m2[i]; }; return difference; } vector <float> operator*(const vector <float>& m1, const vector <float>& m2){ /* Returns the product of two vectors (elementwise multiplication). Inputs: m1: vector m2: vector Output: vector, m1 * m2, product of two vectors m1 and m2 */ const unsigned long VECTOR_SIZE = m1.size(); vector <float> product (VECTOR_SIZE); for (unsigned i = 0; i != VECTOR_SIZE; ++i){ product[i] = m1[i] * m2[i]; }; return product; } vector <float> operator*(const float m1, const vector <float>& m2){ /* Returns the product of a float and a vectors (elementwise multiplication). Inputs: m1: float m2: vector Output: vector, m1 * m2, product of two vectors m1 and m2 */ const unsigned long VECTOR_SIZE = m2.size(); vector <float> product (VECTOR_SIZE); for (unsigned i = 0; i != VECTOR_SIZE; ++i){ product[i] = m1 * m2[i]; }; return product; } vector <float> operator/(const vector <float>& m2, const float m1){ /* Returns the product of a float and a vectors (elementwise multiplication). Inputs: m1: float m2: vector Output: vector, m1 * m2, product of two vectors m1 and m2 */ const unsigned long VECTOR_SIZE = m2.size(); vector <float> product (VECTOR_SIZE); for (unsigned i = 0; i != VECTOR_SIZE; ++i){ product[i] = m2[i] / m1; }; return product; } vector <float> transpose (float *m, const int C, const int R) { /* Returns a transpose matrix of input matrix. Inputs: m: vector, input matrix C: int, number of columns in the input matrix R: int, number of rows in the input matrix Output: vector, transpose matrix mT of input matrix m */ vector <float> mT (C*R); for(unsigned n = 0; n != C*R; n++) { unsigned i = n/C; unsigned j = n%C; mT[n] = m[R*j + i]; } return mT; } vector <float> dot (const vector <float>& m1, const vector <float>& m2, const int m1_rows, const int m1_columns, const int m2_columns) { /* Returns the product of two matrices: m1 x m2. Inputs: m1: vector, left matrix of size m1_rows x m1_columns m2: vector, right matrix of size m1_columns x m2_columns (the number of rows in the right matrix must be equal to the number of the columns in the left one) m1_rows: int, number of rows in the left matrix m1 m1_columns: int, number of columns in the left matrix m1 m2_columns: int, number of columns in the right matrix m2 Output: vector, m1 * m2, product of two vectors m1 and m2, a matrix of size m1_rows x m2_columns */
//use cblas vector <float> output (m1_rows*m2_columns); cblas_sgemm(CblasRowMajor,CblasNoTrans, CblasNoTrans, m1_rows, m2_columns, m1_columns, 1.0, m1.data(), m1_columns, m2.data(), m2_columns, 0.0, output.data(), m2_columns); return output; } vector<string> split(const string &s, char delim) { stringstream ss(s); string item; vector<string> tokens; while (getline(ss, item, delim)) { tokens.push_back(item); } return tokens; } void displayPrediction(vector<float> y_prediction, vector<float> y_output){ cout << "Predictions:" << "\n"; vector<float> predict = print ( y_prediction, 10, 10 ); cout << "Ground truth:" << "\n"; vector<float> output = print ( y_output, 10, 10 ); int accuracy = 0; for(int i=0; i< output.size(); i++){ if(predict[i]==output[i]) accuracy++; } cout<<" Accuracy = " << accuracy << "/" << output.size() << std::endl; } //Train the model in this function- //initialize the training data //Feed Forward //Back Propogate //Adjust Weight //Display epoch data - change this to something meaningful //@params -x training data y labels void train(vector<float> x_input, vector<float> y_output, vector<float>& weight1, vector<float>& weight2, vector<float>& weight3){ int BATCH_SIZE = 256; float lr = .01/BATCH_SIZE; for(unsigned i = 0; i< 10000; i++){ // Building batches of input variables (X) and labels (y) int randindx = rand() % (42000-BATCH_SIZE); vector<float> b_X; vector<float> b_y; for (unsigned j = randindx*784; j < (randindx+BATCH_SIZE)*784; ++j){ b_X.push_back(x_input[j]); } for (unsigned k = randindx*10; k < (randindx+BATCH_SIZE)*10; ++k){ b_y.push_back(y_output[k]); } // Feed forward vector<float> a1 = relu(dot( b_X, weight1, BATCH_SIZE, 784, 128 )); vector<float> a2 = relu(dot( a1, weight2, BATCH_SIZE, 128, 64 )); vector<float> yhat = softmax(dot( a2, weight3, BATCH_SIZE, 64, 10 ), 10); // Back propagation vector<float> dyhat = (yhat - b_y); // dW3 = a2.T * dyhat vector<float> dW3 = dot(transpose( &a2[0], BATCH_SIZE, 64 ), dyhat, 64, BATCH_SIZE, 10); // dz2 = dyhat * W3.T * relu'(a2) vector<float> dz2 = dot(dyhat, transpose( &weight3[0], 64, 10 ), BATCH_SIZE, 10, 64) * reluPrime(a2); // dW2 = a1.T * dz2 vector<float> dW2 = dot(transpose( &a1[0], BATCH_SIZE, 128 ), dz2, 128, BATCH_SIZE, 64); // dz1 = dz2 * W2.T * relu'(a1) vector<float> dz1 = dot(dz2, transpose( &weight2[0], 128, 64 ), BATCH_SIZE, 64, 128) * reluPrime(a1); // dW1 = X.T * dz1 vector<float> dW1 = dot(transpose( &b_X[0], BATCH_SIZE, 784 ), dz1, 784, BATCH_SIZE, 128); // Updating the parameters weight3 = weight3 - lr * dW3; weight2 = weight2 - lr * dW2; weight1 = weight1 - lr * dW1; if ((i+1) % 1000 == 0){ cout << "-----------------------------------------------Epoch " << i+1 << "--------------------------------------------------" <<"\n"; displayPrediction(yhat, y_output); /*cout << "Predictions:" << "\n"; print ( yhat, 10, 10 ); cout << "Ground truth:" << "\n"; print ( b_y, 10, 10 );*/ vector<float> loss_m = yhat - b_y; float loss = 0.0; for (unsigned k = 0; k < BATCH_SIZE*10; ++k){ loss += loss_m[k] * loss_m[k]; } cout << " Loss " << loss/BATCH_SIZE <<"\n"; cout << "--------------------------------------------End of Epoch :(------------------------------------------------" <<"\n"; }; } }
string line; vector<string> line_v; cout << "Loading data ...\n"; vector<float> X_train; vector<float> y_train; ifstream myfile ("train.txt"); if (myfile.is_open()) { while ( getline (myfile,line) ) { line_v = split(line, '\t'); int digit = strtof((line_v[0]).c_str(),0); for (unsigned i = 0; i < 10; ++i) { if (i == digit) { y_train.push_back(1.); } else y_train.push_back(0.); } int size = static_cast<int>(line_v.size()); for (unsigned i = 1; i < size; ++i) { X_train.push_back(strtof((line_v[i]).c_str(),0)); } } X_train = X_train/255.0; myfile.close(); } else cout << "Unable to open file" << '\n'; int xsize = static_cast<int>(X_train.size()); int ysize = static_cast<int>(y_train.size()); // Random initialization of the weights vector <float> W1 = random_vector(784*128); vector <float> W2 = random_vector(128*64); vector <float> W3 = random_vector(64*10); cout << "Training the model ...\n"; train(X_train, y_train, W1, W2, W3); return 0; }
|
The result given below is the comparison of predictions as made by the trained network with the actual output after 10000 iterations. The ground truth is the actual value of the labels between 0-9 (true for the corresponding digit in the dataset). As you see, the accuracy of the network is not that great.
-----------------------------------------------Epoch 10000-------------------------------------------------- Predictions: 0.000848207 9.07445e-06 0.000145165 0.797735 4.94866e-06 0.19374 1.55013e-06 0.000244941 0.00657041 0.000700498 --> Digit = 3 1.36476e-05 1.07548e-07 8.3835e-05 0.000744837 0.299883 9.37717e-05 3.53349e-05 0.00822595 0.00210021 0.688819 --> Digit = 9 5.11556e-06 0.000616957 0.000233088 0.87458 2.20579e-05 0.0140489 5.03569e-08 0.000518445 0.0826038 0.0273714 --> Digit = 3 0.0178851 3.64621e-08 0.0174107 0.000322792 0.716312 0.00120967 0.189534 0.00303238 0.00613965 0.0481543 --> Digit = 4 7.40077e-07 0.96872 0.014224 0.00555447 2.56397e-05 0.000115577 0.000157107 0.00366156 0.00669771 0.000842866 --> Digit = 1 7.37584e-05 0.00306397 0.0184482 0.056542 0.000217984 0.0807415 0.000430994 1.09367e-05 0.838792 0.00167921 --> Digit = 8 1.23026e-05 1.10682e-09 6.47478e-07 0.000129503 1.28475e-05 1.20242e-05 1.18166e-09 0.953265 2.63176e-05 0.046541 --> Digit = 7 0.974183 3.50241e-18 1.99895e-07 3.4534e-07 2.3755e-11 0.0257772 1.96811e-09 6.99407e-09 3.92052e-05 2.28711e-08 --> Digit = 0 2.21581e-05 9.26954e-09 0.000182046 0.00336899 3.40876e-05 0.0800376 8.35955e-07 1.2496e-07 0.914781 0.00157335 --> Digit = 8 8.59312e-07 4.1739e-05 0.000106891 0.000122639 0.00018295 4.02451e-05 7.21105e-07 0.898311 0.00405182 0.0971408 --> Digit = 7
Ground truth: 0 1 0 0 0 0 0 0 0 0 --> Digit = 1 1 0 0 0 0 0 0 0 0 0 --> Digit = 0 0 1 0 0 0 0 0 0 0 0 --> Digit = 1 0 0 0 0 1 0 0 0 0 0 --> Digit = 4 1 0 0 0 0 0 0 0 0 0 --> Digit = 0 1 0 0 0 0 0 0 0 0 0 --> Digit = 0 0 0 0 0 0 0 0 1 0 0 --> Digit = 7 0 0 0 1 0 0 0 0 0 0 --> Digit = 3 0 0 0 0 0 1 0 0 0 0 --> Digit = 5 0 0 0 1 0 0 0 0 0 0 --> Digit = 3
Accuracy = 2/10 Loss 0.184251 --------------------------------------------End of Epoch :(------------------------------------------------
Profiling
Here are the results of profiling the program
Flat profile |
---|
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 99.29 1075.84 1075.84 displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) 0.24 1078.44 2.60 3 0.87 1.16 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) 0.22 1080.88 2.43 split(std::string const&, char) 0.15 1082.50 1.62 print(std::vector<float, std::allocator<float> > const&, int, int) 0.10 1083.61 1.11 reluPrime(std::vector<float, std::allocator<float> > const&) 0.08 1084.48 0.87 519195026 0.00 0.00 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() 0.02 1084.68 0.20 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() 0.00 1084.68 0.00 1 0.00 0.00 transpose(float*, int, int)
|
Call graph |
---|
Call graph granularity: each sample hit covers 2 byte(s) for 0.00% of 1084.68 seconds index % time self children called name <spontaneous> [1] 99.2 1075.84 0.00 displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) [1] 14012000 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] 2.60 0.87 3/3 relu(std::vector<float, std::allocator<float> > const&) [3] [2] 0.3 2.60 0.87 3+14012000 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] 0.87 0.00 519195026/519195026 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [7] 14012000 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] <spontaneous> [3] 0.3 0.00 3.47 relu(std::vector<float, std::allocator<float> > const&) [3] 2.60 0.87 3/3 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] <spontaneous> [4] 0.2 2.43 0.00 split(std::string const&, char) [4] <spontaneous> [5] 0.1 1.62 0.00 print(std::vector<float, std::allocator<float> > const&, int, int) [5] <spontaneous> [6] 0.1 1.11 0.00 reluPrime(std::vector<float, std::allocator<float> > const&) [6] 0.87 0.00 519195026/519195026 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2] [7] 0.1 0.87 0.00 519195026 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [7] <spontaneous> [8] 0.0 0.20 0.00 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [8] 0.00 0.00 1/1 std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&) [30] [16] 0.0 0.00 0.00 1 transpose(float*, int, int) [16] � Index by function name [1] displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) [2] train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [7] std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [5] print(std::vector<float, std::allocator<float> > const&, int, int) [6] reluPrime(std::vector<float, std::allocator<float> > const&) [8] std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [4] split(std::string const&, char) [16] transpose(float*, int, int)
|
Analysis
The total execution time of the program is around 3 minutes. The profiling results spot displayPrediction as the function with maximum execution time. However, thats because it displays a matrix using the naive O(n2) for-loop. train() is the next function with the maximum time. This is the hotspot for the program. If this function is made to be the kernel and the functions that it calls as device functions, the program would fasten by a good proportion.
Merge Sort - Edgar Giang
Intro
The topic I chose is sorting algorithms. The algorithm that I'll be profiling and analyzing would be a merge sort algorithm written in c++ which can be found Here. This code would ask the user to input the size of the array and then fill the array with random elements. The size of the array can be extremely large. From the original creator, the size of the array entered was in the tens of thousands. Theoretically, there would be no limit to how much the user can input however the original creator of this code said it would take the user several hours to sort 10 million elements in the entered array.
How to run the program
The following command was tested in matrix:
g++ fileName.cpp -pg -o test ./test
Running the program
This would be the following output:
Please enter the size of your array: 200000 unsorted sorted time: 166100
As you can see the input used was 200,000. The time indicated in the output is in milliseconds so it took 166100 milliseconds to complete. A much larger input would result in the program running for a very long time.
Profiling
Flat profile |
---|
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 21.90 0.07 0.07 600021 0.00 0.00 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) 15.64 0.12 0.05 200006 0.00 0.00 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) 9.39 0.15 0.03 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) 9.39 0.18 0.03 400015 0.00 0.00 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) 6.26 0.20 0.02 14466088 0.00 0.00 std::vector<int, std::allocator<int> >::operator[](unsigned long) 6.26 0.22 0.02 600021 0.00 0.00 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) 6.26 0.24 0.02 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) 6.26 0.26 0.02 400015 0.00 0.00 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) 3.13 0.27 0.01 600021 0.00 0.00 std::vector<int, std::allocator<int> >::begin() const 3.13 0.28 0.01 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) 3.13 0.29 0.01 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) 3.13 0.30 0.01 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) 3.13 0.31 0.01 400015 0.00 0.00 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) 3.13 0.32 0.01 400015 0.00 0.00 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) 0.00 0.32 0.00 1200042 0.00 0.00 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) 0.00 0.32 0.00 1200042 0.00 0.00 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const 0.00 0.32 0.00 1200042 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 1200042 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 1200042 0.00 0.00 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 1200042 0.00 0.00 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 1200030 0.00 0.00 std::vector<int, std::allocator<int> >::size() const 0.00 0.32 0.00 1000038 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() 0.00 0.32 0.00 600021 0.00 0.00 std::vector<int, std::allocator<int> >::end() const 0.00 0.32 0.00 600018 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) 0.00 0.32 0.00 400017 0.00 0.00 __gnu_cxx::new_allocator<int>::~new_allocator() 0.00 0.32 0.00 400017 0.00 0.00 std::allocator<int>::~allocator() 0.00 0.32 0.00 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) 0.00 0.32 0.00 400016 0.00 0.00 __gnu_cxx::new_allocator<int>::max_size() const 0.00 0.32 0.00 400016 0.00 0.00 std::allocator<int>::allocator(std::allocator<int> const&) 0.00 0.32 0.00 400016 0.00 0.00 std::_Iter_base<int*, false>::_S_base(int*) 0.00 0.32 0.00 400016 0.00 0.00 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() 0.00 0.32 0.00 400016 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() 0.00 0.32 0.00 400016 0.00 0.00 std::vector<int, std::allocator<int> >::~vector() 0.00 0.32 0.00 400016 0.00 0.00 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) 0.00 0.32 0.00 400016 0.00 0.00 void std::_Destroy<int*>(int*, int*) 0.00 0.32 0.00 400016 0.00 0.00 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) 0.00 0.32 0.00 400015 0.00 0.00 __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) 0.00 0.32 0.00 400015 0.00 0.00 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const 0.00 0.32 0.00 400015 0.00 0.00 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) 0.00 0.32 0.00 400015 0.00 0.00 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) 0.00 0.32 0.00 400012 0.00 0.00 unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) 0.00 0.32 0.00 200006 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::capacity() const 0.00 0.32 0.00 200006 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 200006 0.00 0.00 void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::end() 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::begin() 0.00 0.32 0.00 200006 0.00 0.00 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) 0.00 0.32 0.00 200006 0.00 0.00 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 200006 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 200006 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 200006 0.00 0.00 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) 0.00 0.32 0.00 200006 0.00 0.00 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) 0.00 0.32 0.00 2 0.00 0.28 check_sort(std::vector<int, std::allocator<int> >) 0.00 0.32 0.00 1 0.00 0.00 _GLOBAL__sub_I_main 0.00 0.32 0.00 1 0.00 320.11 merge_sort(std::vector<int, std::allocator<int> >) 0.00 0.32 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 0.32 0.00 1 0.00 0.00 __gnu_cxx::new_allocator<int>::new_allocator() 0.00 0.32 0.00 1 0.00 0.00 std::allocator<int>::allocator() 0.00 0.32 0.00 1 0.00 0.00 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) 0.00 0.32 0.00 1 0.00 0.00 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) 0.00 0.32 0.00 1 0.00 0.00 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) 0.00 0.32 0.00 1 0.00 0.00 __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) 0.00 0.32 0.00 1 0.00 0.00 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) 0.00 0.32 0.00 1 0.00 0.00 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) 0.00 0.32 0.00 1 0.00 0.00 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&)
|
Call graph |
---|
Call graph
index % time self children called name <spontaneous> [1] 100.0 0.00 0.32 main [1] 0.00 0.32 1/1 merge_sort(std::vector<int, std::allocator<int> >) [2] 0.00 0.00 200000/14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] 0.00 0.00 1/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] 0.00 0.00 2/400016 std::vector<int, std::allocator<int> >::~vector() [22] 0.00 0.00 200001/1200030 std::vector<int, std::allocator<int> >::size() const [39] 0.00 0.00 1/1 std::allocator<int>::allocator() [70] 0.00 0.00 1/400017 std::allocator<int>::~allocator() [44] 0.00 0.32 1/1 main [1] [2] 99.9 0.00 0.32 1 merge_sort(std::vector<int, std::allocator<int> >) [2] 0.05 0.12 200006/200006 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] 0.01 0.09 200008/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] 0.00 0.03 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] 0.00 0.02 400014/400016 std::vector<int, std::allocator<int> >::~vector() [22] 0.00 0.00 2/2 check_sort(std::vector<int, std::allocator<int> >) [25] 0.00 0.00 400012/400012 unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) [57] 0.00 0.00 1/1200030 std::vector<int, std::allocator<int> >::size() const [39] 0.00 0.00 1/400015 main [1] 0.01 0.09 200006/400015 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] 0.01 0.09 200008/400015 merge_sort(std::vector<int, std::allocator<int> >) [2] [3] 61.5 0.01 0.19 400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] 0.03 0.09 400015/400015 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5] 0.01 0.05 400015/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] 0.01 0.00 400015/600021 std::vector<int, std::allocator<int> >::begin() const [23] 0.00 0.00 400015/400015 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [56] 0.00 0.00 400015/400015 __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [55] 0.00 0.00 400015/1200030 std::vector<int, std::allocator<int> >::size() const [39] 0.00 0.00 400015/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] 0.00 0.00 400015/600021 std::vector<int, std::allocator<int> >::end() const [41] 0.05 0.12 200006/200006 merge_sort(std::vector<int, std::allocator<int> >) [2] [4] 52.3 0.05 0.12 200006 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] 0.01 0.09 200006/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] 0.02 0.00 13866088/14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] 0.03 0.09 400015/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] [5] 37.5 0.03 0.09 400015 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5] 0.02 0.07 400015/400015 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6] 0.02 0.07 400015/400015 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5] [6] 28.1 0.02 0.07 400015 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6] 0.00 0.07 400015/400015 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8] 0.01 0.02 200006/600021 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] 0.01 0.05 400015/600021 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] [7] 28.1 0.02 0.07 600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] 0.07 0.00 600021/600021 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [11] 0.00 0.07 400015/400015 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6] [8] 21.9 0.00 0.07 400015 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8] 0.00 0.07 400015/400015 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] 0.01 0.06 400015/400015 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] [9] 21.9 0.01 0.06 400015 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] 0.01 0.05 400015/600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] 0.00 0.00 800030/1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] 0.00 0.00 400015/400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] 0.00 0.07 400015/400015 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8] [10] 21.9 0.00 0.07 400015 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] 0.01 0.06 400015/400015 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] 0.00 0.00 800030/1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] 0.07 0.00 600021/600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] [11] 21.9 0.07 0.00 600021 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [11] 0.00 0.00 1/400016 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] 0.01 0.05 400015/400016 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] [12] 18.8 0.01 0.05 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] 0.03 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13] 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50] 0.03 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] [13] 15.6 0.03 0.02 400016 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13] 0.00 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19] 0.00 0.03 200006/200006 merge_sort(std::vector<int, std::allocator<int> >) [2] [14] 10.4 0.00 0.03 200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] 0.00 0.03 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::begin() const [23] 0.00 0.00 400012/1200030 std::vector<int, std::allocator<int> >::size() const [39] 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::capacity() const [59] 0.00 0.00 200006/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::end() [62] 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::begin() [63] 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::end() const [41] 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66] 0.00 0.03 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] [15] 9.4 0.00 0.03 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] 0.01 0.02 200006/600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7] 0.00 0.00 400012/1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] 0.00 0.00 200006/200006 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] 0.00 0.03 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] [16] 9.4 0.00 0.03 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] 0.00 0.03 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] 0.00 0.00 400012/1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] 0.00 0.00 200000/14466088 main [1] 0.00 0.00 400000/14466088 check_sort(std::vector<int, std::allocator<int> >) [25] 0.02 0.00 13866088/14466088 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4] [17] 6.2 0.02 0.00 14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] 0.02 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19] [18] 6.2 0.02 0.00 400016 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18] 0.00 0.00 400016/400016 __gnu_cxx::new_allocator<int>::max_size() const [46] 0.00 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13] [19] 6.2 0.00 0.02 400016 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19] 0.02 0.00 400016/400016 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18] 0.01 0.01 400016/400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] [20] 6.2 0.01 0.01 400016 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20] 0.01 0.00 400016/400016 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [24] 0.00 0.02 400016/400016 std::vector<int, std::allocator<int> >::~vector() [22] [21] 6.2 0.00 0.02 400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] 0.01 0.01 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20] 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51] 0.00 0.00 2/400016 main [1] 0.00 0.02 400014/400016 merge_sort(std::vector<int, std::allocator<int> >) [2] [22] 6.2 0.00 0.02 400016 std::vector<int, std::allocator<int> >::~vector() [22] 0.00 0.02 400016/400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] 0.00 0.00 400016/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] 0.00 0.00 400016/400016 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54] 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] 0.01 0.00 400015/600021 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] [23] 3.1 0.01 0.00 600021 std::vector<int, std::allocator<int> >::begin() const [23] 0.00 0.00 600021/1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33] 0.01 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20] [24] 3.1 0.01 0.00 400016 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [24] 0.00 0.00 2/2 merge_sort(std::vector<int, std::allocator<int> >) [2] [25] 0.2 0.00 0.00 2 check_sort(std::vector<int, std::allocator<int> >) [25] 0.00 0.00 400000/14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17] 0.00 0.00 200001/1200030 std::vector<int, std::allocator<int> >::size() const [39] 0.00 0.00 1/1 main [1] [26] 0.0 0.00 0.00 1 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] 0.00 0.00 1/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] 0.00 0.00 600021/1200042 std::vector<int, std::allocator<int> >::begin() const [23] 0.00 0.00 600021/1200042 std::vector<int, std::allocator<int> >::end() const [41] [33] 0.0 0.00 0.00 1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33] 0.00 0.00 1200042/1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36] [34] 0.0 0.00 0.00 1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [34] 0.00 0.00 1200042/1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] [35] 0.0 0.00 0.00 1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [35] 0.00 0.00 1200042/1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] [36] 0.0 0.00 0.00 1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36] 0.00 0.00 1200042/1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [34] 0.00 0.00 400012/1200042 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16] 0.00 0.00 800030/1200042 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10] [37] 0.0 0.00 0.00 1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37] 0.00 0.00 1200042/1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [35] 0.00 0.00 400012/1200042 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] 0.00 0.00 800030/1200042 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] [38] 0.0 0.00 0.00 1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38] 0.00 0.00 1200042/1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36] 0.00 0.00 1/1200030 merge_sort(std::vector<int, std::allocator<int> >) [2] 0.00 0.00 200001/1200030 main [1] 0.00 0.00 200001/1200030 check_sort(std::vector<int, std::allocator<int> >) [25] 0.00 0.00 400012/1200030 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] 0.00 0.00 400015/1200030 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] [39] 0.0 0.00 0.00 1200030 std::vector<int, std::allocator<int> >::size() const [39] 0.00 0.00 1/1000038 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] 0.00 0.00 200006/1000038 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] 0.00 0.00 400015/1000038 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] 0.00 0.00 400016/1000038 std::vector<int, std::allocator<int> >::~vector() [22] [40] 0.0 0.00 0.00 1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] 0.00 0.00 200006/600021 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] 0.00 0.00 400015/600021 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] [41] 0.0 0.00 0.00 600021 std::vector<int, std::allocator<int> >::end() const [41] 0.00 0.00 600021/1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33] 0.00 0.00 200006/600018 std::vector<int, std::allocator<int> >::begin() [63] 0.00 0.00 200006/600018 std::vector<int, std::allocator<int> >::end() [62] 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] [42] 0.0 0.00 0.00 600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] 0.00 0.00 400017/400017 std::allocator<int>::~allocator() [44] [43] 0.0 0.00 0.00 400017 __gnu_cxx::new_allocator<int>::~new_allocator() [43] 0.00 0.00 1/400017 main [1] 0.00 0.00 400016/400017 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51] [44] 0.0 0.00 0.00 400017 std::allocator<int>::~allocator() [44] 0.00 0.00 400017/400017 __gnu_cxx::new_allocator<int>::~new_allocator() [43] 0.00 0.00 400016/400016 std::allocator<int>::allocator(std::allocator<int> const&) [47] [45] 0.0 0.00 0.00 400016 __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [45] 0.00 0.00 400016/400016 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18] [46] 0.0 0.00 0.00 400016 __gnu_cxx::new_allocator<int>::max_size() const [46] 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50] [47] 0.0 0.00 0.00 400016 std::allocator<int>::allocator(std::allocator<int> const&) [47] 0.00 0.00 400016/400016 __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [45] 0.00 0.00 400016/400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] [48] 0.0 0.00 0.00 400016 std::_Iter_base<int*, false>::_S_base(int*) [48] 0.00 0.00 400016/400016 void std::_Destroy<int*>(int*, int*) [53] [49] 0.0 0.00 0.00 400016 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [49] 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12] [50] 0.0 0.00 0.00 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50] 0.00 0.00 400016/400016 std::allocator<int>::allocator(std::allocator<int> const&) [47] 0.00 0.00 400016/400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21] [51] 0.0 0.00 0.00 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51] 0.00 0.00 400016/400017 std::allocator<int>::~allocator() [44] 0.00 0.00 1/400016 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] 0.00 0.00 400015/400016 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9] [52] 0.0 0.00 0.00 400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] 0.00 0.00 400016/400016 std::_Iter_base<int*, false>::_S_base(int*) [48] 0.00 0.00 400016/400016 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54] [53] 0.0 0.00 0.00 400016 void std::_Destroy<int*>(int*, int*) [53] 0.00 0.00 400016/400016 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [49] 0.00 0.00 400016/400016 std::vector<int, std::allocator<int> >::~vector() [22] [54] 0.0 0.00 0.00 400016 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54] 0.00 0.00 400016/400016 void std::_Destroy<int*>(int*, int*) [53] 0.00 0.00 400015/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] [55] 0.0 0.00 0.00 400015 __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [55] 0.00 0.00 400015/400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3] [56] 0.0 0.00 0.00 400015 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [56] 0.00 0.00 400012/400012 merge_sort(std::vector<int, std::allocator<int> >) [2] [57] 0.0 0.00 0.00 400012 unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) [57] 0.00 0.00 200006/200006 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60] [58] 0.0 0.00 0.00 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [58] 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] [59] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::capacity() const [59] 0.00 0.00 200006/200006 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] [60] 0.0 0.00 0.00 200006 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60] 0.00 0.00 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [58] 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65] [61] 0.0 0.00 0.00 200006 void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [61] 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] [62] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::end() [62] 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] [63] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::begin() [63] 0.00 0.00 200006/600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42] 0.00 0.00 200006/200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15] [64] 0.0 0.00 0.00 200006 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] 0.00 0.00 200006/200006 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60] 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66] [65] 0.0 0.00 0.00 200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65] 0.00 0.00 200006/200006 void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [61] 0.00 0.00 200006/200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14] [66] 0.0 0.00 0.00 200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66] 0.00 0.00 200006/200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65] 0.00 0.00 1/1 __libc_csu_init [90] [67] 0.0 0.00 0.00 1 _GLOBAL__sub_I_main [67] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [68] 0.00 0.00 1/1 _GLOBAL__sub_I_main [67] [68] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [68] 0.00 0.00 1/1 std::allocator<int>::allocator() [70] [69] 0.0 0.00 0.00 1 __gnu_cxx::new_allocator<int>::new_allocator() [69] 0.00 0.00 1/1 main [1] [70] 0.0 0.00 0.00 1 std::allocator<int>::allocator() [70] 0.00 0.00 1/1 __gnu_cxx::new_allocator<int>::new_allocator() [69] 0.00 0.00 1/1 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74] [71] 0.0 0.00 0.00 1 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71] 0.00 0.00 1/1 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26] [72] 0.0 0.00 0.00 1 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] 0.00 0.00 1/1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40] 0.00 0.00 1/1 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75] 0.00 0.00 1/1 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] [73] 0.0 0.00 0.00 1 __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) [73] 0.00 0.00 1/1 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75] [74] 0.0 0.00 0.00 1 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74] 0.00 0.00 1/1 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71] 0.00 0.00 1/1 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72] [75] 0.0 0.00 0.00 1 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75] 0.00 0.00 1/1 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74] 0.00 0.00 1/1 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71] [76] 0.0 0.00 0.00 1 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76] 0.00 0.00 1/400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52] 0.00 0.00 1/1 __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) [73] � Index by function name [67] _GLOBAL__sub_I_main [44] std::allocator<int>::~allocator() [22] std::vector<int, std::allocator<int> >::~vector() [25] check_sort(std::vector<int, std::allocator<int> >) [35] std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [14] std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [2] merge_sort(std::vector<int, std::allocator<int> >) [36] std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [17] std::vector<int, std::allocator<int> >::operator[](unsigned long) [68] __static_initialization_and_destruction_0(int, int) [60] std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [73] __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) [4] merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [48] std::_Iter_base<int*, false>::_S_base(int*) [37] std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [24] __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [11] int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [38] std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [18] __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [61] void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [45] __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [49] void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [52] std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [69] __gnu_cxx::new_allocator<int>::new_allocator() [19] std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [7] int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [43] __gnu_cxx::new_allocator<int>::~new_allocator() [50] std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [15] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [55] __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [51] std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [9] int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [33] __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [20] std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [6] int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [42] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [13] std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [74] void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [46] __gnu_cxx::new_allocator<int>::max_size() const [40] std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [5] int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [34] __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [12] std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [75] void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [58] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [21] std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [57] unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) [56] std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [8] int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [16] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [41] std::vector<int, std::allocator<int> >::end() const [71] void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [10] int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [39] std::vector<int, std::allocator<int> >::size() const [72] std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [76] int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [23] std::vector<int, std::allocator<int> >::begin() const [62] std::vector<int, std::allocator<int> >::end() [65] void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [59] std::vector<int, std::allocator<int> >::capacity() const [63] std::vector<int, std::allocator<int> >::begin() [66] void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [47] std::allocator<int>::allocator(std::allocator<int> const&) [3] std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [53] void std::_Destroy<int*>(int*, int*) [70] std::allocator<int>::allocator() [26] std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [54] void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&)
|
Analysis
21.90 0.07 0.07 600021 0.00 0.00 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) 15.64 0.12 0.05 200006 0.00 0.00 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int)
From the flat profile,the hotspots for this program would be the function merge function whih was called 200,006 times and thee int* std::__copy__move. The merge function would take up 52.3% of the time
Assignment 2
Our initial idea was to use the neural network code for our assignment 2. But since the algorithm itself was not very accurate (2/10 correct predictions even after 10,000 training iterations), we decided to paralellize merge sort. Soon we realized that since its Big O classification was n log n, offloading computations to GPU would not be that effective. So, we settled with the cosine transform library, as described below.
Cosine Tranformation
The Cosine_Transform is a simple C++ library which demonstrates properties of the Discrete cosine Transform for real data. The Discrete Cosine Transform or DCT is used to create jpeg (compressed images).
The formula used here is:
| (√1/n) , if u=0; 0≤v≤n-1 C(u,v) = | (√2/n) * cos[((2*v+1)π*u)/2n], if 1≤u≤n-1; 0≤v≤n-1
Where, u is the row index, v is the column index and n is the total number of elements in a row/column in the computational matrix.
This Link can be used for better understanding of the above formula. Here is the source code used.
Profiling
The flat profile for the above serial code looks like:
Flat Profile |
---|
1 2 3 4 granularity: each sample hit covers 2 byte(s) for 0.68% of 1.47 seconds 5 6 index % time self children called name 7 <spontaneous> 8 [1] 100.0 0.00 1.47 main [1] 9 0.00 1.47 1/1 cosine_transform_test01(int) [3] 10 ----------------------------------------------- 11 1.47 0.00 1/1 cosine_transform_test01(int) [3] 12 [2] 100.0 1.47 0.00 1 cosine_transform_data(int, double*) [2] 13 ----------------------------------------------- 14 0.00 1.47 1/1 main [1] 15 [3] 100.0 0.00 1.47 1 cosine_transform_test01(int) [3] 16 1.47 0.00 1/1 cosine_transform_data(int, double*) [2] 17 0.00 0.00 1/1 r8vec_uniform_01_new(int, int&) [14] 18 0.00 0.00 1/1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] 19 0.00 0.00 1/1 std::common_type<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::type std::chrono::operator-<std::chrono::_V2::s teady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 10 00000000l> > > const&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) [21] 20 ----------------------------------------------- 21 0.00 0.00 1/3 std::chrono::duration<long, std::ratio<1l, 1000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::ratio<1l, 1000000l>, long, true, false>: :__cast<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [18] 22 0.00 0.00 2/3 std::common_type<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::type std::chrono::operator-<long, std::ratio<1l , 1000000000l>, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&, std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [22] 23 [10] 0.0 0.00 0.00 3 std::chrono::duration<long, std::ratio<1l, 1000000000l> >::count() const [10] 24 ----------------------------------------------- 25 0.00 0.00 2/2 std::common_type<std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::type std::chrono::operator-<std::chrono::_V2::s teady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 10 00000000l> > > const&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) [21] 26 [11] 0.0 0.00 0.00 2 std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::time_since_epoch() const [11] 27 ----------------------------------------------- 28 0.00 0.00 1/1 __libc_csu_init [28] 29 [12] 0.0 0.00 0.00 1 _GLOBAL__sub_I__Z20r8vec_uniform_01_newiRi [12] 30 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [15] 31 ----------------------------------------------- 32 0.00 0.00 1/1 cosine_transform_test01(int) [3] 33 [13] 0.0 0.00 0.00 1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] 34 0.00 0.00 1/1 std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000l> > >::type std::chrono::duratio n_cast<std::chrono::duration<long, std::ratio<1l, 1000l> >, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [17] 35 0.00 0.00 1/1 std::chrono::duration<long, std::ratio<1l, 1000l> >::count() const [16] 36 ----------------------------------------------- 37 0.00 0.00 1/1 cosine_transform_test01(int) [3] 38 [14] 0.0 0.00 0.00 1 r8vec_uniform_01_new(int, int&) [14] 39 ----------------------------------------------- 40 0.00 0.00 1/1 _GLOBAL__sub_I__Z20r8vec_uniform_01_newiRi [12] 41 [15] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [15] 42 ----------------------------------------------- 43 0.00 0.00 1/1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] 44 [16] 0.0 0.00 0.00 1 std::chrono::duration<long, std::ratio<1l, 1000l> >::count() const [16] 45 ----------------------------------------------- 46 0.00 0.00 1/1 reportTime(char const*, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) [13] 47 [17] 0.0 0.00 0.00 1 std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000l> > >::type std::chrono::duration_ca st<std::chrono::duration<long, std::ratio<1l, 1000l> >, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [17] 48 0.00 0.00 1/1 std::chrono::duration<long, std::ratio<1l, 1000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::ratio<1l, 1000000l>, long, true, false>: :__cast<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [18] 49 ----------------------------------------------- 50 0.00 0.00 1/1 std::enable_if<std::chrono::__is_duration<std::chrono::duration<long, std::ratio<1l, 1000l> > >::value, std::chrono::duration<long, std::ratio<1l, 1000l> > >::type std::chrono::duratio n_cast<std::chrono::duration<long, std::ratio<1l, 1000l> >, long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [17] 51 [18] 0.0 0.00 0.00 1 std::chrono::duration<long, std::ratio<1l, 1000l> > std::chrono::__duration_cast_impl<std::chrono::duration<long, std::ratio<1l, 1000l> >, std::ratio<1l, 1000000l>, long, true, false>::__c ast<long, std::ratio<1l, 1000000000l> >(std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&) [18] 52 0.00 0.00 1/3 std::chrono::duration<long, std::ratio<1l, 1000000000l> >::count() const [10] |
As is evident, the algorithm is O(n2) currently. Using thread indices on the GPU to replace the for loops could potentially improve performance. To increase the efficiency of the program we transformed the cosine_transform_data function into a kernel named cosTransformKernel which offloads the compute intense calculation of the program to the GPU.
Kernel Version 1
Modified Code |
---|
# include <iostream> # include <iomanip> # include <ctime> # include <chrono> # include <cstdlib> # include <cmath> #include <cuda_runtime.h> using namespace std; using namespace std::chrono; const double pi = 3.141592653589793; const int ntpb = 1024; void cosine_transform_test01 ( int size ); double *r8vec_uniform_01_new ( int n, int &seed ){ int i; const int i4_huge = 2147483647; int k; double *r; if ( seed == 0 ){ cerr << "\n"; cerr << "R8VEC_UNIFORM_01_NEW - Fatal error!\n"; cerr << " Input value of SEED = 0.\n"; exit ( 1 ); } r = new double[n]; for ( i = 0; i < n; i++ ){ k = seed / 127773; seed = 16807 * ( seed - k * 127773 ) - k * 2836; if ( seed < 0 ){ seed = seed + i4_huge; } r[i] = ( double ) ( seed ) * 4.656612875E-10; } return r; } double *cosine_transform_data ( int n, double d[] ){ double angle; double *c; int i; int j; c = new double[n]; for ( i = 0; i < n; i++ ){ c[i] = 0.0; for ( j = 0; j < n; j++ ){ angle = pi * ( double ) ( i * ( 2 * j + 1 ) ) / ( double ) ( 2 * n ); c[i] = c[i] + cos ( angle ) * d[j]; } c[i] = c[i] * sqrt ( 2.0 / ( double ) ( n ) ); } return c; } void reportTime(const char* msg, steady_clock::duration span) { auto ms = duration_cast<milliseconds>(span); std::cout << msg << " - took - " << ms.count() << " millisecs" << std::endl; } __global__ void cosTransformKernel(double *a, double *b, int n){ double angle; const double pi = 3.141592653589793; int i = blockIdx.x * blockDim.x + threadIdx.x; for(int j=0; j<n; j++){ angle = pi * (double) (i*(2*j+1)) / (double)(2*n); b[i] += cos ( angle ) * a[j]; } b[i] *= sqrt( 2.0 / (double) n ); } int main (int argc, char* argv[] ){ if (argc != 2) { std::cerr << argv[0] << ": invalid number of arguments\n"; std::cerr << "Usage: " << argv[0] << " size_of_vector\n"; return 1; } int n = std::atoi(argv[1]); cosine_transform_test01 (n); return 0; } void cosine_transform_test01 ( int size){ int n = size; int seed; double *r; double *hs; double *s = new double[n]; double *d_a; double *d_b; //allocate memory on the device for the randomly generated array and for the array in which transform values will be stored cudaMalloc((void**)&d_a,sizeof(double) * n); cudaMalloc((void**)&d_b,sizeof(double) * n); seed = 123456789; r = r8vec_uniform_01_new ( n, seed ); //copy randomly generated values from host to device cudaMemcpy(d_a,r,sizeof(double)*n,cudaMemcpyHostToDevice); int nblks = (n + ntpb - 1) / ntpb; steady_clock::time_point ts, te; ts = steady_clock::now(); cosTransformKernel<<<nblks,ntpb>>>(d_a,d_b,size); cudaDeviceSynchronize(); te = steady_clock::now(); reportTime("Cosine Transform on device",te-ts); cudaMemcpy(s,d_b,sizeof(double)*n,cudaMemcpyDeviceToHost); ts = steady_clock::now(); hs = cosine_transform_data ( n, r ); te = steady_clock::now(); reportTime("Cosine Transform on host",te-ts); cudaFree(d_a); cudaFree(d_b); delete [] r; delete [] s; delete [] hs; } |
The graph for the execution time difference between the device and the host looks like:
Even though the kernel includes a for-loop the execution time has decreased drastically. Thats because each thread is now responsible for one calculating one element of the final Cos transformed matrix(unit vector).
Assignment 3
For optimizing the code better, we thought of removing the iterative loop from the kernel by using threadIdx.y to control calculation of each element's cosine for that position in the supposed matrix. The problem in this was that each thread was in a racing condition to write to the same memory location, to sum up the cosine transformations for all elements of that row. We solved this by using the atomic function. Its prototype is as follows. double atomicAdd(double* address, double value)
Kernel Version 2
Kernel 2 |
---|
# include <cmath> # include <cstdlib> # include <iostream> # include <iomanip> # include <ctime> # include <chrono> # include <cstdlib> # include <cmath> #include <limits> #include <cuda_runtime.h> #include <cuda.h> using namespace std; using namespace std::chrono; const double pi = 3.141592653589793; const unsigned ntpb = 32; void cosine_transform_test01 ( int size ); double *r8vec_uniform_01_new ( int n, int &seed ){ int i; const int i4_huge = 2147483647; int k; double *r; if ( seed == 0 ){ cerr << "\n"; cerr << "R8VEC_UNIFORM_01_NEW - Fatal error!\n"; cerr << " Input value of SEED = 0.\n"; exit ( 1 ); } r = new double[n]; for ( i = 0; i < n; i++ ){ k = seed / 127773; seed = 16807 * ( seed - k * 127773 ) - k * 2836; if ( seed < 0 ){ seed = seed + i4_huge; } r[i] = ( double ) ( seed ) * 4.656612875E-10; } return r; } double *cosine_transform_data ( int n, double d[] ){ double angle; double *c; int i; int j; c = new double[n]; for ( i = 0; i < n; i++ ){ c[i] = 0.0; for ( j = 0; j < n; j++ ){ angle = pi * ( double ) ( i * ( 2 * j + 1 ) ) / ( double ) ( 2 * n ); c[i] = c[i] + cos ( angle ) * d[j]; } c[i] = c[i] * sqrt ( 2.0 / ( double ) ( n ) ); } return c; }
void reportTime(const char* msg, steady_clock::duration span) { auto ms = duration_cast<milliseconds>(span); std::cout << msg << " - took - " << ms.count() << " millisecs" << std::endl; } __global__ void cosTransformKernel(double *a, double *b, const int n){ double angle; const double pi = 3.141592653589793; int j = blockIdx.x * blockDim.x + threadIdx.x; int i = blockIdx.y * blockDim.y + threadIdx.y; if(i<n && j<n){ angle = pi * ( double ) ( i * ( 2 * j + 1 ) ) / ( double ) ( 2 * n ); double value = cos ( angle ) * a[j]; b[i] = atomicAdd(&b[i], value); } //square root of the whole cos transformed row term if(j==n-1 && i<n){ b[i] *= sqrt ( 2.0 / ( double ) ( n ) ); } } int main (int argc, char* argv[] ){ if (argc != 2) { std::cerr << argv[0] << ": invalid number of arguments\n"; std::cerr << "Usage: " << argv[0] << " size_of_vector\n"; return 1; } int n = std::atoi(argv[1]); cosine_transform_test01 (n); return 0; } void cosine_transform_test01 ( int size){ int n = size; int seed; double *r; double *hs; //host side pointer to store the array returned from host side cosine_transform_data, for comparison purposes double *s = new double[n]; //double *t; double *d_a; double *d_b; //allocate memory on the device for the randomly generated array and for the array in which transform values will be stored cudaMalloc((void**)&d_a,sizeof(double) * n); cudaMalloc((void**)&d_b,sizeof(double) * n); seed = 123456789; r = r8vec_uniform_01_new ( n, seed ); //copy randomly generated values from host to device for(int i=0; i<n; i++) s[i]=0.0; cudaMemcpy(d_a,r,sizeof(double)*n,cudaMemcpyHostToDevice); cudaMemcpy(d_b,s,sizeof(double)*n,cudaMemcpyHostToDevice); int nblks = (n + ntpb - 1) / ntpb; dim3 grid(nblks,nblks,1); dim3 block(ntpb,ntpb,1); steady_clock::time_point ts, te; ts = steady_clock::now(); cosTransformKernel<<<grid,block>>>(d_a,d_b,size); cudaDeviceSynchronize(); te = steady_clock::now(); reportTime("Cosine Transform on device",te-ts); cudaMemcpy(s,d_b,sizeof(double)*n,cudaMemcpyDeviceToHost); ts = steady_clock::now(); hs = cosine_transform_data ( n, r ); te = steady_clock::now(); reportTime("Cosine Transform on host",te-ts); cudaFree(d_a); cudaFree(d_b); delete [] r; delete [] s; delete [] hs; //delete [] t; return; } |
Here is a comparison between the naive and optimized kernel
Evidently, there is some performance boost for the new version. However, each call to atomicAdd by a thread locks the global memory until the old value is read and added to the passed value. This deters faster execution as might be expected.