Difference between revisions of "Algo holics"

From CDOT Wiki
Jump to: navigation, search
(Profiling results)
(Profiling results)
Line 1,330: Line 1,330:
 
====Profiling results====
 
====Profiling results====
  
[[File:serialvParallel.jpg]]
+
[[File:serialvParallel.jpg|750px]]
  
 
=== Assignment 3 ===
 
=== Assignment 3 ===

Revision as of 05:23, 1 April 2019


GPU610/DPS915 | Student List | Group and Project Index | Student Resources | Glossary

Algo Holics

Team Members

  1. Sukhbeer Dhillon, Simple Neural Network
  2. Gurpreet Singh, Sudoku Puzzle Solver
  3. Edgar Giang, Merge sort
  4. Email All

Progress

Assignment 1

Sudoku Puzzle Solver - Gurpreet Singh

Is it a program that solves Sudoku puzzles(9X9) using Bruteforce algorithm. The user can either pass a Sudoku files as an input or enter the values manually. Moreover, the file or the manual entry must strictly have 9 rows and 9 columns in them. Last but not the least, all the cells must be separated by a space and the cells that needs to be solved must have 0 in them as their value.

The original source code can be found at Link

Logic

In this program the Bruteforce algorithm first put 1 in the first cell. Then it moves to the second cell and put 1 in there and check if it satisfies all the rules and conditions. If it don't, then the algorithm will increment it's value to 2 and then check again. The value can change from 0-9 to find the correct value for a cell. If none of the value from the range of 0-9 satisfies the cell, then the program will iterate back and change the value of the first cell to 2 and then try the whole process again. In this way it will solve the puzzle.

Compiling the program

Enter the following commands:

  g++ -std=c++0x -pg solver.cpp checks.cpp checksolution.cpp -o a
  a fileName

-pg directs the compiler to include the executable code required for profiling.

-o directs the compiler to name the executable a.


If we run the sample-puzzle-1 (level- easy) file, which has the following text inside it:

       0 6 0 0 0 0 9 7 2
       0 5 0 0 0 2 0 0 3
       0 7 0 3 9 0 5 0 0
       2 0 0 0 0 5 4 0 8
       0 0 0 0 0 0 0 0 0
       3 0 1 8 0 0 0 0 6
       0 0 4 0 2 3 0 8 0
       7 0 0 9 0 0 0 2 0
       9 2 5 0 0 0 0 4 0

The output will be:

       1 6 3 4 5 8 9 7 2
       4 5 9 7 1 2 8 6 3
       8 7 2 3 9 6 5 1 4
       2 9 7 1 6 5 4 3 8
       5 8 6 2 3 4 1 9 7
       3 4 1 8 7 9 2 5 6
       6 1 4 5 2 3 7 8 9
       7 3 8 9 4 1 6 2 5
       9 2 5 6 8 7 3 4 1


Analysis

To analyze the call graph, enter the following command:

    gprof -q -b a> a.clg

-q directs the profiler (gprof) to output a call graph.

-b directs the profiler to omit detailed explanations of the column headings from the output.


The call graph for the above execution looks like:

Flat profile
Call graph
granularity: each sample hit covers 2 byte(s) no time propagated
index % time    self  children    called     name
               0.00    0.00    4539/4539        placeNum(int, int) [10]
[8]      0.0    0.00    0.00    4539         checkRow(int, int) [8]
-----------------------------------------------
               0.00    0.00    1620/1620        placeNum(int, int) [10]
[9]      0.0    0.00    0.00    1620         checkColumn(int, int) [9]
-----------------------------------------------
               0.00    0.00    1120/1120        solveSudoku() [16]
[10]     0.0    0.00    0.00    1120         placeNum(int, int) [10]
               0.00    0.00    4539/4539        checkRow(int, int) [8]
               0.00    0.00    1620/1620        checkColumn(int, int) [9]
               0.00    0.00     698/698         checkSquare(int, int, int) [11]
-----------------------------------------------
               0.00    0.00     698/698         placeNum(int, int) [10]
[11]     0.0    0.00    0.00     698         checkSquare(int, int, int) [11]
-----------------------------------------------
               0.00    0.00     476/476         solveSudoku() [16]
[12]     0.0    0.00    0.00     476         goBack(int&, int&) [12]
-----------------------------------------------
               0.00    0.00       2/2           main [6]
[13]     0.0    0.00    0.00       2         print(int (*) [9]) [13]
-----------------------------------------------
               0.00    0.00       1/1           __libc_csu_init [30]
[14]     0.0    0.00    0.00       1         _GLOBAL__sub_I_sudoku [14]
               0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [18]
-----------------------------------------------
               0.00    0.00       1/1           __libc_csu_init [30]
[15]     0.0    0.00    0.00       1         _GLOBAL__sub_I_temp [15]
               0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
               0.00    0.00       1/1           main [6]
[16]     0.0    0.00    0.00       1         solveSudoku() [16]
               0.00    0.00    1120/1120        placeNum(int, int) [10]
               0.00    0.00     476/476         goBack(int&, int&) [12]
-----------------------------------------------
               0.00    0.00       1/1           main [6]
[17]     0.0    0.00    0.00       1         storePositions() [17]
-----------------------------------------------
               0.00    0.00       1/1           _GLOBAL__sub_I_sudoku [14]
[18]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [18]
-----------------------------------------------
               0.00    0.00       1/1           _GLOBAL__sub_I_temp [15]
[19]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
Index by function name
 [14] _GLOBAL__sub_I_sudoku  [16] solveSudoku()          [13] print(int (*) [9])
 [15] _GLOBAL__sub_I_temp    [17] storePositions()       [12] goBack(int&, int&)
  [9] checkColumn(int, int)  [18] __static_initialization_and_destruction_0(int, int) [8] checkRow(int, int)
 [11] checkSquare(int, int, int) [19] __static_initialization_and_destruction_0(int, int) [10] placeNum(int, int)

From the above Call graph we can see that the program took no time in finding the solution and the maximum number of calls were made to the checkRow, checkColumn and checkSquare function. However, to get a better understanding of the program let's try a harder Sudoku puzzle.


If we run the sample-puzzle-2-hard (Level- hard) file, which has the following text inside it:

      0 0 0 0 0 0 0 0 0
      0 0 0 0 0 3 0 8 5
      0 0 1 0 2 0 0 0 0
      0 0 0 5 0 7 0 0 0
      0 0 4 0 0 0 1 0 0
      0 9 0 0 0 0 0 0 0
      5 0 0 0 0 0 0 7 3
      0 0 2 0 1 0 0 0 0
      0 0 0 0 4 0 0 0 9

The output will be:

      9 8 7 6 5 4 3 2 1
      2 4 6 1 7 3 9 8 5
      3 5 1 9 2 8 7 4 6
      1 2 8 5 3 7 6 9 4
      6 3 4 8 9 2 1 5 7
      7 9 5 4 6 1 8 3 2
      5 1 9 2 8 6 4 7 3
      4 7 2 3 1 9 5 6 8
      8 6 3 7 4 5 2 1 9

The Call graph for the following looks like:

Flat profile
Call graph
granularity: each sample hit covers 2 byte(s) for 0.04% of 26.79 seconds
index % time    self  children    called     name
                                                <spontaneous>
[1]    100.0    0.00   26.78                 main [1]
               0.68   26.09       1/1           solveSudoku() [2]
               0.01    0.00       1/1           storePositions() [9]
               0.00    0.00       2/2           print(int (*) [9]) [17]
-----------------------------------------------
               0.68   26.09       1/1           main [1]
[2]     99.9    0.68   26.09       1         solveSudoku() [2]
               3.64   21.56 157353814/157353814     placeNum(int, int) [3]
               0.89    0.00 69175252/69175252     goBack(int&, int&) [7]
-----------------------------------------------
               3.64   21.56 157353814/157353814     solveSudoku() [2]
[3]     94.1    3.64   21.56 157353814         placeNum(int, int) [3]
              13.31    0.00 622577597/622577597     checkRow(int, int) [4]
               5.04    0.00 223365661/223365661     checkColumn(int, int) [5]
               3.21    0.00 100608583/100608583     checkSquare(int, int, int) [6]
-----------------------------------------------
              13.31    0.00 622577597/622577597     placeNum(int, int) [3]
[4]     49.7   13.31    0.00 622577597         checkRow(int, int) [4]
-----------------------------------------------
               5.04    0.00 223365661/223365661     placeNum(int, int) [3]
[5]     18.8    5.04    0.00 223365661         checkColumn(int, int) [5]
-----------------------------------------------
               3.21    0.00 100608583/100608583     placeNum(int, int) [3]
[6]     12.0    3.21    0.00 100608583         checkSquare(int, int, int) [6]
-----------------------------------------------
               0.89    0.00 69175252/69175252     solveSudoku() [2]
[7]      3.3    0.89    0.00 69175252         goBack(int&, int&) [7]
-----------------------------------------------
               0.01    0.00       1/1           __libc_csu_init [10]
[8]      0.0    0.01    0.00       1         _GLOBAL__sub_I_sudoku [8]
               0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
               0.01    0.00       1/1           main [1]
[9]      0.0    0.01    0.00       1         storePositions() [9]
-----------------------------------------------
                                                <spontaneous>
[10]     0.0    0.00    0.01                 __libc_csu_init [10]
               0.01    0.00       1/1           _GLOBAL__sub_I_sudoku [8]
               0.00    0.00       1/1           _GLOBAL__sub_I_temp [18]
-----------------------------------------------
               0.00    0.00       2/2           main [1]
[17]     0.0    0.00    0.00       2         print(int (*) [9]) [17]
-----------------------------------------------
               0.00    0.00       1/1           __libc_csu_init [10]
[18]     0.0    0.00    0.00       1         _GLOBAL__sub_I_temp [18]
               0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [20]
-----------------------------------------------
               0.00    0.00       1/1           _GLOBAL__sub_I_sudoku [8]
[19]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [19]
-----------------------------------------------
               0.00    0.00       1/1           _GLOBAL__sub_I_temp [18]
[20]     0.0    0.00    0.00       1         __static_initialization_and_destruction_0(int, int) [20]
-----------------------------------------------
Index by function name
  [8] _GLOBAL__sub_I_sudoku   [2] solveSudoku()          [17] print(int (*) [9])
 [18] _GLOBAL__sub_I_temp     [9] storePositions()        [7] goBack(int&, int&)
  [5] checkColumn(int, int)  [19] __static_initialization_and_destruction_0(int, int) [4] checkRow(int, int)
  [6] checkSquare(int, int, int) [20] __static_initialization_and_destruction_0(int, int) [3] placeNum(int, int)


From the above Call graph we can see that for a harder Sudoku puzzle, the time increased significantly. Moreover, it can also be seen that almost 50% of the time is consumed by the checkRow function, 18.8% by checkColumn and finally 12% by the checkSquare function. Thousand of calls were made to these 3 functions, if we parallelizing these functions then the efficiency of the program can be increased significantly.


Simple Artificial Neural Network - Sukhbeer Singh

Introduction

I am very interested in neural networks and I started learning about them recently. I think this is a good opportunity to build on my knowledge of a NN while also parallelising it. For that purpose, I have selected a very basic neural network which feeds forward with ReLu and softmax and back-propagates on a sample batch from MNIST handwritten digits dataset. In each iteration, the weights are adjusted to train the network for better predictions. The code performs matrix-multiplication(dot-product) each time that the activation vector and delta vector are calculated for the next node and the previous node respectively.

Source Code

Here is the source code used.

I changed the source code to put the network's training in a separate function called train. Also, I used cblas library to do matrix-matrix multiplication inside dot function, instead of using three nested for loops.

nn.cpp
//  To compile: g++ -o nn nn.cpp -std=c++11 -lgslcblas
//  To run: ./nn
//  Created by Sergei Bugrov on 4/20/18.
//  Copyright © 2017 Sergei Bugrov. All rights reserved.
//  Download dataset from: https://drive.google.com/file/d/1tVyvg6c1Eo5ojtiz0R17YEzcUe5cN285/view
//  Updated By - Sukhbeer Singh Dhillon, March 23rd 2019
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <vector>
#include <math.h>
#include <fstream>
#include <sstream>
#include <string>
#include <random>
#include <algorithm>
extern "C"{

#include<gsl/gsl_cblas.h>

}
using namespace std;
vector<float> print ( const vector <float>& m, int n_rows, int n_columns ) {   
   /*  "Couts" the input vector as n_rows x n_columns matrix.
    Inputs:
    m: vector, matrix of size n_rows x n_columns
    n_rows: int, number of rows in the left matrix m1
    n_columns: int, number of columns in the left matrix m1
    */
   vector<float> outputDigits;
   for( int i = 0; i != n_rows; ++i ) {
   float digitPredicted = 0.0;
   int index = 0;
       for( int j = 0; j != n_columns; ++j ) {

float currentValue = m[i * n_columns + j];

           cout << currentValue  << " ";

if(currentValue > digitPredicted){ digitPredicted = currentValue; index = j; }

       }

outputDigits.push_back(index);

       cout << " --> Digit = " << index <<"\n";
   }
   cout << endl;
   return outputDigits;
}
int argmax ( const vector <float>& m ) {
   return distance(m.begin(), max_element(m.begin(), m.end()));
}
vector <float> relu(const vector <float>& z){
   int size = z.size();
   vector <float> output;
   for( int i = 0; i < size; ++i ) {
       if (z[i] < 0){
           output.push_back(0.0);
       }
       else output.push_back(z[i]);
   }
   return output;
}
vector <float> reluPrime (const vector <float>& z) {
   int size = z.size();
   vector <float> output;
   for( int i = 0; i < size; ++i ) {
       if (z[i] <= 0){
           output.push_back(0.0);
       }
       else output.push_back(1.0);
   }
   return output;
} 
static vector<float> random_vector(const int size){
   random_device rd;
   mt19937 gen(rd());
   uniform_real_distribution<> distribution(0.0, 0.05);
   static default_random_engine generator;
   vector<float> data(size);
   generate(data.begin(), data.end(), [&]() { return distribution(generator); });
   return data;
}
vector <float> softmax (const vector <float>& z, const int dim) {   
   const int zsize = static_cast<int>(z.size());
   vector <float> out;   
   for (unsigned i = 0; i != zsize; i += dim) {
       vector <float> foo;
       for (unsigned j = 0; j != dim; ++j) {
           foo.push_back(z[i + j]);
       }      
       float max_foo = *max_element(foo.begin(), foo.end());
       for (unsigned j = 0; j != dim; ++j) {
           foo[j] = exp(foo[j] - max_foo);
       }      
       float sum_of_elems = 0.0;
       for (unsigned j = 0; j != dim; ++j) {
           sum_of_elems = sum_of_elems + foo[j];
       }
       
       for (unsigned j = 0; j != dim; ++j) {
           out.push_back(foo[j]/sum_of_elems);
       }
   }
   return out;
}
vector <float> sigmoid_d (const vector <float>& m1) {
   
   /*  Returns the value of the sigmoid function derivative f'(x) = f(x)(1 - f(x)),
    where f(x) is sigmoid function.
    Input: m1, a vector.
    Output: x(1 - x) for every element of the input matrix m1.
    */
   
   const unsigned long VECTOR_SIZE = m1.size();
   vector <float> output (VECTOR_SIZE);
   
   
   for( unsigned i = 0; i != VECTOR_SIZE; ++i ) {
       output[ i ] = m1[ i ] * (1 - m1[ i ]);
   }
   
   return output;

}

vector <float> sigmoid (const vector <float>& m1) {

   /*  Returns the value of the sigmoid function f(x) = 1/(1 + e^-x).
    Input: m1, a vector.
    Output: 1/(1 + e^-x) for every element of the input matrix m1.
    */
   
   const unsigned long VECTOR_SIZE = m1.size();
   vector <float> output (VECTOR_SIZE);
   
   
   for( unsigned i = 0; i != VECTOR_SIZE; ++i ) {
       output[ i ] = 1 / (1 + exp(-m1[ i ]));
   }
   
   return output;

}

vector <float> operator+(const vector <float>& m1, const vector <float>& m2){

   /*  Returns the elementwise sum of two vectors.
    Inputs:
    m1: a vector
    m2: a vector
    Output: a vector, sum of the vectors m1 and m2.
    */
   
   const unsigned long VECTOR_SIZE = m1.size();
   vector <float> sum (VECTOR_SIZE);
   
   for (unsigned i = 0; i != VECTOR_SIZE; ++i){
       sum[i] = m1[i] + m2[i];
   };
   
   return sum;

}

vector <float> operator-(const vector <float>& m1, const vector <float>& m2){

   /*  Returns the difference between two vectors.
    Inputs:
    m1: vector
    m2: vector
    Output: vector, m1 - m2, difference between two vectors m1 and m2.
    */
   
   const unsigned long VECTOR_SIZE = m1.size();
   vector <float> difference (VECTOR_SIZE);
   
   for (unsigned i = 0; i != VECTOR_SIZE; ++i){
       difference[i] = m1[i] - m2[i];
   };
   
   return difference;

}

vector <float> operator*(const vector <float>& m1, const vector <float>& m2){

   /*  Returns the product of two vectors (elementwise multiplication).
    Inputs:
    m1: vector
    m2: vector
    Output: vector, m1 * m2, product of two vectors m1 and m2
    */
   
   const unsigned long VECTOR_SIZE = m1.size();
   vector <float> product (VECTOR_SIZE);
   
   for (unsigned i = 0; i != VECTOR_SIZE; ++i){
       product[i] = m1[i] * m2[i];
   };
   
   return product;

}

vector <float> operator*(const float m1, const vector <float>& m2){

   /*  Returns the product of a float and a vectors (elementwise multiplication).
    Inputs:
    m1: float
    m2: vector
    Output: vector, m1 * m2, product of two vectors m1 and m2
    */
   
   const unsigned long VECTOR_SIZE = m2.size();
   vector <float> product (VECTOR_SIZE);
   
   for (unsigned i = 0; i != VECTOR_SIZE; ++i){
       product[i] = m1 * m2[i];
   };
   
   return product;

}

vector <float> operator/(const vector <float>& m2, const float m1){

   /*  Returns the product of a float and a vectors (elementwise multiplication).
    Inputs:
    m1: float
    m2: vector
    Output: vector, m1 * m2, product of two vectors m1 and m2
    */
   
   const unsigned long VECTOR_SIZE = m2.size();
   vector <float> product (VECTOR_SIZE);
   
   for (unsigned i = 0; i != VECTOR_SIZE; ++i){
       product[i] = m2[i] / m1;
   };
   
   return product;

}

vector <float> transpose (float *m, const int C, const int R) {

   /*  Returns a transpose matrix of input matrix.
    Inputs:
    m: vector, input matrix
    C: int, number of columns in the input matrix
    R: int, number of rows in the input matrix
    Output: vector, transpose matrix mT of input matrix m
    */
   
   vector <float> mT (C*R);
   
   for(unsigned n = 0; n != C*R; n++) {
       unsigned i = n/C;
       unsigned j = n%C;
       mT[n] = m[R*j + i];
   }
   
   return mT;

}

vector <float> dot (const vector <float>& m1, const vector <float>& m2, const int m1_rows, const int m1_columns, const int m2_columns) {

   /*  Returns the product of two matrices: m1 x m2.
    Inputs:
    m1: vector, left matrix of size m1_rows x m1_columns
    m2: vector, right matrix of size m1_columns x m2_columns (the number of rows in the right matrix
    must be equal to the number of the columns in the left one)
    m1_rows: int, number of rows in the left matrix m1
    m1_columns: int, number of columns in the left matrix m1
    m2_columns: int, number of columns in the right matrix m2
    Output: vector, m1 * m2, product of two vectors m1 and m2, a matrix of size m1_rows x m2_columns
    */


  //use cblas 
   vector <float> output (m1_rows*m2_columns);
  cblas_sgemm(CblasRowMajor,CblasNoTrans, CblasNoTrans, m1_rows, m2_columns, m1_columns, 1.0, m1.data(), m1_columns, m2.data(), m2_columns, 0.0, output.data(), m2_columns); 
   return output;

}

vector<string> split(const string &s, char delim) {

   stringstream ss(s);
   string item;
   vector<string> tokens;
   while (getline(ss, item, delim)) {
       tokens.push_back(item);
   }
   return tokens;

}

void displayPrediction(vector<float> y_prediction, vector<float> y_output){

           cout << "Predictions:" << "\n";
           vector<float> predict = print ( y_prediction, 10, 10 );
           cout << "Ground truth:" << "\n";
           vector<float> output = print ( y_output, 10, 10 );

int accuracy = 0; for(int i=0; i< output.size(); i++){ if(predict[i]==output[i]) accuracy++; } cout<<" Accuracy = " << accuracy << "/" << output.size() << std::endl; }

//Train the model in this function- //initialize the training data //Feed Forward //Back Propogate //Adjust Weight //Display epoch data - change this to something meaningful //@params -x training data y labels void train(vector<float> x_input, vector<float> y_output, vector<float>& weight1, vector<float>& weight2, vector<float>& weight3){

   int BATCH_SIZE = 256;
   float lr = .01/BATCH_SIZE;
   for(unsigned i = 0; i< 10000; i++){
      	// Building batches of input variables (X) and labels (y)
      	int randindx = rand() % (42000-BATCH_SIZE);
      	vector<float> b_X;
      	vector<float> b_y;
      	for (unsigned j = randindx*784; j < (randindx+BATCH_SIZE)*784; ++j){

b_X.push_back(x_input[j]);

       }
       for (unsigned k = randindx*10; k < (randindx+BATCH_SIZE)*10; ++k){
       	b_y.push_back(y_output[k]);
       }
       // Feed forward
       vector<float> a1 = relu(dot( b_X, weight1, BATCH_SIZE, 784, 128 ));
       vector<float> a2 = relu(dot( a1, weight2, BATCH_SIZE, 128, 64 ));
       vector<float> yhat = softmax(dot( a2, weight3, BATCH_SIZE, 64, 10 ), 10);
       
       // Back propagation
       vector<float> dyhat = (yhat - b_y);
       // dW3 = a2.T * dyhat
       vector<float> dW3 = dot(transpose( &a2[0], BATCH_SIZE, 64 ), dyhat, 64, BATCH_SIZE, 10);
       // dz2 = dyhat * W3.T * relu'(a2)
       vector<float> dz2 = dot(dyhat, transpose( &weight3[0], 64, 10 ), BATCH_SIZE, 10, 64) * reluPrime(a2);
       // dW2 = a1.T * dz2
       vector<float> dW2 = dot(transpose( &a1[0], BATCH_SIZE, 128 ), dz2, 128, BATCH_SIZE, 64);
       // dz1 = dz2 * W2.T * relu'(a1)
       vector<float> dz1 = dot(dz2, transpose( &weight2[0], 128, 64 ), BATCH_SIZE, 64, 128) * reluPrime(a1);
       // dW1 = X.T * dz1
       vector<float> dW1 = dot(transpose( &b_X[0], BATCH_SIZE, 784 ), dz1, 784, BATCH_SIZE, 128);
       
       // Updating the parameters
       weight3 = weight3 - lr * dW3;
       weight2 = weight2 - lr * dW2;	
       weight1 = weight1 - lr * dW1;
   
       if ((i+1) % 1000 == 0){
           cout << "-----------------------------------------------Epoch " << i+1 << "--------------------------------------------------" <<"\n";
           displayPrediction(yhat, y_output);

/*cout << "Predictions:" << "\n";

           print ( yhat, 10, 10 );
           cout << "Ground truth:" << "\n";
           print ( b_y, 10, 10 );*/
           vector<float> loss_m = yhat - b_y;
           float loss = 0.0;
           for (unsigned k = 0; k < BATCH_SIZE*10; ++k){
               loss += loss_m[k] * loss_m[k];
           }
           cout << "                                            Loss " << loss/BATCH_SIZE <<"\n";
           cout << "--------------------------------------------End of Epoch :(------------------------------------------------" <<"\n";
       };
   }

}


int main(int argc, const char * argv[]) {

   string line;
   vector<string> line_v;
   cout << "Loading data ...\n";
   vector<float> X_train;
   vector<float> y_train;
   ifstream myfile ("train.txt");
   if (myfile.is_open())
   {
       while ( getline (myfile,line) )
       {
           line_v = split(line, '\t');
           int digit = strtof((line_v[0]).c_str(),0);
           for (unsigned i = 0; i < 10; ++i) {
               if (i == digit)
               {
                   y_train.push_back(1.);
               }
               else y_train.push_back(0.);
           }
           
           int size = static_cast<int>(line_v.size());
           for (unsigned i = 1; i < size; ++i) {
               X_train.push_back(strtof((line_v[i]).c_str(),0));
           }
       }
       X_train = X_train/255.0;
       myfile.close();
   }
   
   else cout << "Unable to open file" << '\n';
   
   int xsize = static_cast<int>(X_train.size());
   int ysize = static_cast<int>(y_train.size());
   
   // Random initialization of the weights
   vector <float> W1 = random_vector(784*128);
   vector <float> W2 = random_vector(128*64);
   vector <float> W3 = random_vector(64*10);
   cout << "Training the model ...\n";
   train(X_train, y_train, W1, W2, W3);    
   return 0;

}


The result given below is the comparison of predictions as made by the trained network with the actual output after 10000 iterations. The ground truth is the actual value of the labels between 0-9 (true for the corresponding digit in the dataset). As you see, the accuracy of the network is not that great.

-----------------------------------------------Epoch 10000--------------------------------------------------
Predictions:
0.000848207 9.07445e-06 0.000145165 0.797735    4.94866e-06 0.19374     1.55013e-06 0.000244941 0.00657041  0.000700498  --> Digit = 3
1.36476e-05 1.07548e-07 8.3835e-05  0.000744837 0.299883    9.37717e-05 3.53349e-05 0.00822595  0.00210021  0.688819     --> Digit = 9
5.11556e-06 0.000616957 0.000233088 0.87458     2.20579e-05 0.0140489   5.03569e-08 0.000518445 0.0826038   0.0273714    --> Digit = 3
0.0178851   3.64621e-08 0.0174107   0.000322792 0.716312    0.00120967  0.189534    0.00303238  0.00613965  0.0481543    --> Digit = 4
7.40077e-07 0.96872     0.014224    0.00555447  2.56397e-05 0.000115577 0.000157107 0.00366156  0.00669771  0.000842866  --> Digit = 1
7.37584e-05 0.00306397  0.0184482   0.056542    0.000217984 0.0807415   0.000430994 1.09367e-05 0.838792    0.00167921   --> Digit = 8
1.23026e-05 1.10682e-09 6.47478e-07 0.000129503 1.28475e-05 1.20242e-05 1.18166e-09 0.953265    2.63176e-05 0.046541     --> Digit = 7
0.974183    3.50241e-18 1.99895e-07 3.4534e-07  2.3755e-11  0.0257772   1.96811e-09 6.99407e-09 3.92052e-05 2.28711e-08  --> Digit = 0
2.21581e-05 9.26954e-09 0.000182046 0.00336899  3.40876e-05 0.0800376   8.35955e-07 1.2496e-07  0.914781    0.00157335   --> Digit = 8
8.59312e-07 4.1739e-05  0.000106891 0.000122639 0.00018295  4.02451e-05 7.21105e-07 0.898311    0.00405182  0.0971408    --> Digit = 7
Ground truth:
0 1 0 0 0 0 0 0 0 0  --> Digit = 1
1 0 0 0 0 0 0 0 0 0  --> Digit = 0
0 1 0 0 0 0 0 0 0 0  --> Digit = 1
0 0 0 0 1 0 0 0 0 0  --> Digit = 4 
1 0 0 0 0 0 0 0 0 0  --> Digit = 0
1 0 0 0 0 0 0 0 0 0  --> Digit = 0
0 0 0 0 0 0 0 1 0 0  --> Digit = 7
0 0 0 1 0 0 0 0 0 0  --> Digit = 3
0 0 0 0 0 1 0 0 0 0  --> Digit = 5
0 0 0 1 0 0 0 0 0 0  --> Digit = 3
                                            Accuracy = 2/10
                                            Loss 0.184251
--------------------------------------------End of Epoch :(------------------------------------------------
Profiling

Here are the results of profiling the program

Flat profile

Flat profile:

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
99.29   1075.84  1075.84                             displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >)
 0.24   1078.44     2.60        3     0.87     1.16  train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&)
 0.22   1080.88     2.43                             split(std::string const&, char)
 0.15   1082.50     1.62                             print(std::vector<float, std::allocator<float> > const&, int, int)
 0.10   1083.61     1.11                             reluPrime(std::vector<float, std::allocator<float> > const&)
 0.08   1084.48     0.87 519195026    0.00     0.00  std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf()
 0.02   1084.68     0.20                             std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf()
 0.00   1084.68     0.00        1     0.00     0.00  transpose(float*, int, int)


Call graph

Call graph

granularity: each sample hit covers 2 byte(s) for 0.00% of 1084.68 seconds

index % time self children called name

                                                <spontaneous>

[1] 99.2 1075.84 0.00 displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) [1]


                            14012000             train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2]
               2.60    0.87       3/3           relu(std::vector<float, std::allocator<float> > const&) [3]

[2] 0.3 2.60 0.87 3+14012000 train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2]

               0.87    0.00 519195026/519195026     std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [7]
                            14012000             train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2]

                                                <spontaneous>

[3] 0.3 0.00 3.47 relu(std::vector<float, std::allocator<float> > const&) [3]

               2.60    0.87       3/3           train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2]

                                                <spontaneous>

[4] 0.2 2.43 0.00 split(std::string const&, char) [4]


                                                <spontaneous>

[5] 0.1 1.62 0.00 print(std::vector<float, std::allocator<float> > const&, int, int) [5]


                                                <spontaneous>

[6] 0.1 1.11 0.00 reluPrime(std::vector<float, std::allocator<float> > const&) [6]


               0.87    0.00 519195026/519195026     train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) [2]

[7] 0.1 0.87 0.00 519195026 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [7]


                                                <spontaneous>

[8] 0.0 0.20 0.00 std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() [8]


               0.00    0.00       1/1           std::vector<float, std::allocator<float> >::vector(unsigned long, std::allocator<float> const&) [30]

[16] 0.0 0.00 0.00 1 transpose(float*, int, int) [16]


� Index by function name

  [1] displayPrediction(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >) 
  [2] train(std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&, std::vector<float, std::allocator<float> >&) 
  [7] std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf()
  [5] print(std::vector<float, std::allocator<float> > const&, int, int) 
  [6] reluPrime(std::vector<float, std::allocator<float> > const&) 
  [8] std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf()
  [4] split(std::string const&, char) [16] transpose(float*, int, int)


Analysis

The total execution time of the program is around 3 minutes. The profiling results spot displayPrediction as the function with maximum execution time. However, thats because it displays a matrix using the naive O(n2) for-loop. train() is the next function with the maximum time. This is the hotspot for the program. If this function is made to be the kernel and the functions that it calls as device functions, the program would fasten by a good proportion.


Merge Sort - Edgar Giang

Intro

The topic I chose is sorting algorithms. The algorithm that I'll be profiling and analyzing would be a merge sort algorithm written in c++ which can be found Here. This code would ask the user to input the size of the array and then fill the array with random elements. The size of the array can be extremely large. From the original creator, the size of the array entered was in the tens of thousands. Theoretically, there would be no limit to how much the user can input however the original creator of this code said it would take the user several hours to sort 10 million elements in the entered array.


How to run the program

The following command was tested in matrix:

  g++ fileName.cpp -pg -o test
  
  ./test


Running the program

This would be the following output:

  Please enter the size of your array:
  200000
  unsorted
  sorted
  time: 166100

As you can see the input used was 200,000. The time indicated in the output is in milliseconds so it took 166100 milliseconds to complete. A much larger input would result in the program running for a very long time.

Profiling
Flat profile

Flat profile:

Each sample counts as 0.01 seconds.

 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
21.90      0.07     0.07   600021     0.00     0.00  int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*)
15.64      0.12     0.05   200006     0.00     0.00  merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int)
 9.39      0.15     0.03   400016     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long)
 9.39      0.18     0.03   400015     0.00     0.00  int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&)
 6.26      0.20     0.02 14466088     0.00     0.00  std::vector<int, std::allocator<int> >::operator[](unsigned long)
 6.26      0.22     0.02   600021     0.00     0.00  int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*)
 6.26      0.24     0.02   400016     0.00     0.00  __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*)
 6.26      0.26     0.02   400015     0.00     0.00  int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*)
 3.13      0.27     0.01   600021     0.00     0.00  std::vector<int, std::allocator<int> >::begin() const
 3.13      0.28     0.01   400016     0.00     0.00  __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long)
 3.13      0.29     0.01   400016     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long)
 3.13      0.30     0.01   400016     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&)
 3.13      0.31     0.01   400015     0.00     0.00  std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&)
 3.13      0.32     0.01   400015     0.00     0.00  int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*)
 0.00      0.32     0.00  1200042     0.00     0.00  __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&)
 0.00      0.32     0.00  1200042     0.00     0.00  __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const
 0.00      0.32     0.00  1200042     0.00     0.00  std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00  1200042     0.00     0.00  std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00  1200042     0.00     0.00  std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00  1200042     0.00     0.00  std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00  1200030     0.00     0.00  std::vector<int, std::allocator<int> >::size() const
 0.00      0.32     0.00  1000038     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator()
 0.00      0.32     0.00   600021     0.00     0.00  std::vector<int, std::allocator<int> >::end() const
 0.00      0.32     0.00   600018     0.00     0.00  __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&)
 0.00      0.32     0.00   400017     0.00     0.00  __gnu_cxx::new_allocator<int>::~new_allocator()
 0.00      0.32     0.00   400017     0.00     0.00  std::allocator<int>::~allocator()
 0.00      0.32     0.00   400016     0.00     0.00  __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&)
 0.00      0.32     0.00   400016     0.00     0.00  __gnu_cxx::new_allocator<int>::max_size() const
 0.00      0.32     0.00   400016     0.00     0.00  std::allocator<int>::allocator(std::allocator<int> const&)
 0.00      0.32     0.00   400016     0.00     0.00  std::_Iter_base<int*, false>::_S_base(int*)
 0.00      0.32     0.00   400016     0.00     0.00  void std::_Destroy_aux<true>::__destroy<int*>(int*, int*)
 0.00      0.32     0.00   400016     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long)
 0.00      0.32     0.00   400016     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&)
 0.00      0.32     0.00   400016     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl()
 0.00      0.32     0.00   400016     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::~_Vector_base()
 0.00      0.32     0.00   400016     0.00     0.00  std::vector<int, std::allocator<int> >::~vector()
 0.00      0.32     0.00   400016     0.00     0.00  std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*)
 0.00      0.32     0.00   400016     0.00     0.00  void std::_Destroy<int*>(int*, int*)
 0.00      0.32     0.00   400016     0.00     0.00  void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&)
 0.00      0.32     0.00   400015     0.00     0.00  __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&)
 0.00      0.32     0.00   400015     0.00     0.00  std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const
 0.00      0.32     0.00   400015     0.00     0.00  int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*)
 0.00      0.32     0.00   400015     0.00     0.00  int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*)
 0.00      0.32     0.00   400012     0.00     0.00  unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&)
 0.00      0.32     0.00   200006     0.00     0.00  __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const
 0.00      0.32     0.00   200006     0.00     0.00  std::vector<int, std::allocator<int> >::capacity() const
 0.00      0.32     0.00   200006     0.00     0.00  std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00   200006     0.00     0.00  void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00   200006     0.00     0.00  std::vector<int, std::allocator<int> >::end()
 0.00      0.32     0.00   200006     0.00     0.00  std::vector<int, std::allocator<int> >::begin()
 0.00      0.32     0.00   200006     0.00     0.00  std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&)
 0.00      0.32     0.00   200006     0.00     0.00  std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00   200006     0.00     0.00  __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00   200006     0.00     0.00  __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00   200006     0.00     0.00  void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 0.00      0.32     0.00   200006     0.00     0.00  void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&)
 0.00      0.32     0.00        2     0.00     0.28  check_sort(std::vector<int, std::allocator<int> >)
 0.00      0.32     0.00        1     0.00     0.00  _GLOBAL__sub_I_main
 0.00      0.32     0.00        1     0.00   320.11  merge_sort(std::vector<int, std::allocator<int> >)
 0.00      0.32     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int)
 0.00      0.32     0.00        1     0.00     0.00  __gnu_cxx::new_allocator<int>::new_allocator()
 0.00      0.32     0.00        1     0.00     0.00  std::allocator<int>::allocator()
 0.00      0.32     0.00        1     0.00     0.00  void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&)
 0.00      0.32     0.00        1     0.00     0.00  std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&)
 0.00      0.32     0.00        1     0.00     0.00  std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&)
 0.00      0.32     0.00        1     0.00     0.00  __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&)
 0.00      0.32     0.00        1     0.00     0.00  void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&)
 0.00      0.32     0.00        1     0.00     0.00  void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&)
 0.00      0.32     0.00        1     0.00     0.00  int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&)


Call graph

Call graph


granularity: each sample hit covers 2 byte(s) for 3.12% of 0.32 seconds

index % time self children called name

                                                <spontaneous>

[1] 100.0 0.00 0.32 main [1]

               0.00    0.32       1/1           merge_sort(std::vector<int, std::allocator<int> >) [2]
               0.00    0.00  200000/14466088     std::vector<int, std::allocator<int> >::operator[](unsigned long) [17]
               0.00    0.00       1/400015      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]
               0.00    0.00       1/1           std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26]
               0.00    0.00       2/400016      std::vector<int, std::allocator<int> >::~vector() [22]
               0.00    0.00  200001/1200030     std::vector<int, std::allocator<int> >::size() const [39]
               0.00    0.00       1/1           std::allocator<int>::allocator() [70]
               0.00    0.00       1/400017      std::allocator<int>::~allocator() [44]

               0.00    0.32       1/1           main [1]

[2] 99.9 0.00 0.32 1 merge_sort(std::vector<int, std::allocator<int> >) [2]

               0.05    0.12  200006/200006      merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4]
               0.01    0.09  200008/400015      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]
               0.00    0.03  200006/200006      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]
               0.00    0.02  400014/400016      std::vector<int, std::allocator<int> >::~vector() [22]
               0.00    0.00       2/2           check_sort(std::vector<int, std::allocator<int> >) [25]
               0.00    0.00  400012/400012      unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) [57]
               0.00    0.00       1/1200030     std::vector<int, std::allocator<int> >::size() const [39]

               0.00    0.00       1/400015      main [1]
               0.01    0.09  200006/400015      merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4]
               0.01    0.09  200008/400015      merge_sort(std::vector<int, std::allocator<int> >) [2]

[3] 61.5 0.01 0.19 400015 std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

               0.03    0.09  400015/400015      int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5]
               0.01    0.05  400015/400016      std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12]
               0.01    0.00  400015/600021      std::vector<int, std::allocator<int> >::begin() const [23]
               0.00    0.00  400015/400015      std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [56]
               0.00    0.00  400015/400015      __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [55]
               0.00    0.00  400015/1200030     std::vector<int, std::allocator<int> >::size() const [39]
               0.00    0.00  400015/1000038     std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40]
               0.00    0.00  400015/600021      std::vector<int, std::allocator<int> >::end() const [41]

               0.05    0.12  200006/200006      merge_sort(std::vector<int, std::allocator<int> >) [2]

[4] 52.3 0.05 0.12 200006 merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4]

               0.01    0.09  200006/400015      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]
               0.02    0.00 13866088/14466088     std::vector<int, std::allocator<int> >::operator[](unsigned long) [17]

               0.03    0.09  400015/400015      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

[5] 37.5 0.03 0.09 400015 int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5]

               0.02    0.07  400015/400015      int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6]

               0.02    0.07  400015/400015      int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&) [5]

[6] 28.1 0.02 0.07 400015 int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6]

               0.00    0.07  400015/400015      int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8]

               0.01    0.02  200006/600021      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15]
               0.01    0.05  400015/600021      int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9]

[7] 28.1 0.02 0.07 600021 int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7]

               0.07    0.00  600021/600021      int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [11]

               0.00    0.07  400015/400015      int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [6]

[8] 21.9 0.00 0.07 400015 int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8]

               0.00    0.07  400015/400015      int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10]

               0.01    0.06  400015/400015      int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10]

[9] 21.9 0.01 0.06 400015 int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9]

               0.01    0.05  400015/600021      int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7]
               0.00    0.00  800030/1200042     std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38]
               0.00    0.00  400015/400016      std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52]

               0.00    0.07  400015/400015      int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [8]

[10] 21.9 0.00 0.07 400015 int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10]

               0.01    0.06  400015/400015      int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9]
               0.00    0.00  800030/1200042     std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37]

               0.07    0.00  600021/600021      int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7]

[11] 21.9 0.07 0.00 600021 int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [11]


               0.00    0.00       1/400016      std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26]
               0.01    0.05  400015/400016      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

[12] 18.8 0.01 0.05 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12]

               0.03    0.02  400016/400016      std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13]
               0.00    0.00  400016/400016      std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50]

               0.03    0.02  400016/400016      std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12]

[13] 15.6 0.03 0.02 400016 std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13]

               0.00    0.02  400016/400016      std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19]

               0.00    0.03  200006/200006      merge_sort(std::vector<int, std::allocator<int> >) [2]

[14] 10.4 0.00 0.03 200006 std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]

               0.00    0.03  200006/200006      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16]
               0.00    0.00  200006/600021      std::vector<int, std::allocator<int> >::begin() const [23]
               0.00    0.00  400012/1200030     std::vector<int, std::allocator<int> >::size() const [39]
               0.00    0.00  200006/200006      std::vector<int, std::allocator<int> >::capacity() const [59]
               0.00    0.00  200006/1000038     std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40]
               0.00    0.00  200006/200006      std::vector<int, std::allocator<int> >::end() [62]
               0.00    0.00  200006/200006      std::vector<int, std::allocator<int> >::begin() [63]
               0.00    0.00  200006/600021      std::vector<int, std::allocator<int> >::end() const [41]
               0.00    0.00  200006/200006      void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66]

               0.00    0.03  200006/200006      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16]

[15] 9.4 0.00 0.03 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15]

               0.01    0.02  200006/600021      int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*) [7]
               0.00    0.00  400012/1200042     std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38]
               0.00    0.00  200006/200006      std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64]
               0.00    0.00  200006/600018      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42]

               0.00    0.03  200006/200006      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]

[16] 9.4 0.00 0.03 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16]

               0.00    0.03  200006/200006      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15]
               0.00    0.00  400012/1200042     std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37]

               0.00    0.00  200000/14466088     main [1]
               0.00    0.00  400000/14466088     check_sort(std::vector<int, std::allocator<int> >) [25]
               0.02    0.00 13866088/14466088     merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [4]

[17] 6.2 0.02 0.00 14466088 std::vector<int, std::allocator<int> >::operator[](unsigned long) [17]


               0.02    0.00  400016/400016      std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19]

[18] 6.2 0.02 0.00 400016 __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18]

               0.00    0.00  400016/400016      __gnu_cxx::new_allocator<int>::max_size() const [46]

               0.00    0.02  400016/400016      std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [13]

[19] 6.2 0.00 0.02 400016 std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [19]

               0.02    0.00  400016/400016      __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18]

               0.01    0.01  400016/400016      std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21]

[20] 6.2 0.01 0.01 400016 std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20]

               0.01    0.00  400016/400016      __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [24]

               0.00    0.02  400016/400016      std::vector<int, std::allocator<int> >::~vector() [22]

[21] 6.2 0.00 0.02 400016 std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21]

               0.01    0.01  400016/400016      std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20]
               0.00    0.00  400016/400016      std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51]

               0.00    0.00       2/400016      main [1]
               0.00    0.02  400014/400016      merge_sort(std::vector<int, std::allocator<int> >) [2]

[22] 6.2 0.00 0.02 400016 std::vector<int, std::allocator<int> >::~vector() [22]

               0.00    0.02  400016/400016      std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21]
               0.00    0.00  400016/1000038     std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40]
               0.00    0.00  400016/400016      void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54]

               0.00    0.00  200006/600021      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]
               0.01    0.00  400015/600021      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

[23] 3.1 0.01 0.00 600021 std::vector<int, std::allocator<int> >::begin() const [23]

               0.00    0.00  600021/1200042     __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33]

               0.01    0.00  400016/400016      std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [20]

[24] 3.1 0.01 0.00 400016 __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [24]


               0.00    0.00       2/2           merge_sort(std::vector<int, std::allocator<int> >) [2]

[25] 0.2 0.00 0.00 2 check_sort(std::vector<int, std::allocator<int> >) [25]

               0.00    0.00  400000/14466088     std::vector<int, std::allocator<int> >::operator[](unsigned long) [17]
               0.00    0.00  200001/1200030     std::vector<int, std::allocator<int> >::size() const [39]

               0.00    0.00       1/1           main [1]

[26] 0.0 0.00 0.00 1 std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26]

               0.00    0.00       1/400016      std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12]
               0.00    0.00       1/1           std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72]

               0.00    0.00  600021/1200042     std::vector<int, std::allocator<int> >::begin() const [23]
               0.00    0.00  600021/1200042     std::vector<int, std::allocator<int> >::end() const [41]

[33] 0.0 0.00 0.00 1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33]


               0.00    0.00 1200042/1200042     std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36]

[34] 0.0 0.00 0.00 1200042 __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [34]


               0.00    0.00 1200042/1200042     std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37]

[35] 0.0 0.00 0.00 1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [35]


               0.00    0.00 1200042/1200042     std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38]

[36] 0.0 0.00 0.00 1200042 std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36]

               0.00    0.00 1200042/1200042     __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [34]

               0.00    0.00  400012/1200042     __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [16]
               0.00    0.00  800030/1200042     int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [10]

[37] 0.0 0.00 0.00 1200042 std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [37]

               0.00    0.00 1200042/1200042     std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [35]

               0.00    0.00  400012/1200042     __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15]
               0.00    0.00  800030/1200042     int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9]

[38] 0.0 0.00 0.00 1200042 std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [38]

               0.00    0.00 1200042/1200042     std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [36]

               0.00    0.00       1/1200030     merge_sort(std::vector<int, std::allocator<int> >) [2]
               0.00    0.00  200001/1200030     main [1]
               0.00    0.00  200001/1200030     check_sort(std::vector<int, std::allocator<int> >) [25]
               0.00    0.00  400012/1200030     std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]
               0.00    0.00  400015/1200030     std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

[39] 0.0 0.00 0.00 1200030 std::vector<int, std::allocator<int> >::size() const [39]


               0.00    0.00       1/1000038     std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72]
               0.00    0.00  200006/1000038     std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]
               0.00    0.00  400015/1000038     std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]
               0.00    0.00  400016/1000038     std::vector<int, std::allocator<int> >::~vector() [22]

[40] 0.0 0.00 0.00 1000038 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40]


               0.00    0.00  200006/600021      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]
               0.00    0.00  400015/600021      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

[41] 0.0 0.00 0.00 600021 std::vector<int, std::allocator<int> >::end() const [41]

               0.00    0.00  600021/1200042     __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [33]

               0.00    0.00  200006/600018      std::vector<int, std::allocator<int> >::begin() [63]
               0.00    0.00  200006/600018      std::vector<int, std::allocator<int> >::end() [62]
               0.00    0.00  200006/600018      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15]

[42] 0.0 0.00 0.00 600018 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42]


               0.00    0.00  400017/400017      std::allocator<int>::~allocator() [44]

[43] 0.0 0.00 0.00 400017 __gnu_cxx::new_allocator<int>::~new_allocator() [43]


               0.00    0.00       1/400017      main [1]
               0.00    0.00  400016/400017      std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51]

[44] 0.0 0.00 0.00 400017 std::allocator<int>::~allocator() [44]

               0.00    0.00  400017/400017      __gnu_cxx::new_allocator<int>::~new_allocator() [43]

               0.00    0.00  400016/400016      std::allocator<int>::allocator(std::allocator<int> const&) [47]

[45] 0.0 0.00 0.00 400016 __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [45]


               0.00    0.00  400016/400016      __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [18]

[46] 0.0 0.00 0.00 400016 __gnu_cxx::new_allocator<int>::max_size() const [46]


               0.00    0.00  400016/400016      std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50]

[47] 0.0 0.00 0.00 400016 std::allocator<int>::allocator(std::allocator<int> const&) [47]

               0.00    0.00  400016/400016      __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [45]

               0.00    0.00  400016/400016      std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52]

[48] 0.0 0.00 0.00 400016 std::_Iter_base<int*, false>::_S_base(int*) [48]


               0.00    0.00  400016/400016      void std::_Destroy<int*>(int*, int*) [53]

[49] 0.0 0.00 0.00 400016 void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [49]


               0.00    0.00  400016/400016      std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [12]

[50] 0.0 0.00 0.00 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [50]

               0.00    0.00  400016/400016      std::allocator<int>::allocator(std::allocator<int> const&) [47]

               0.00    0.00  400016/400016      std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [21]

[51] 0.0 0.00 0.00 400016 std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [51]

               0.00    0.00  400016/400017      std::allocator<int>::~allocator() [44]

               0.00    0.00       1/400016      int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76]
               0.00    0.00  400015/400016      int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [9]

[52] 0.0 0.00 0.00 400016 std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52]

               0.00    0.00  400016/400016      std::_Iter_base<int*, false>::_S_base(int*) [48]

               0.00    0.00  400016/400016      void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54]

[53] 0.0 0.00 0.00 400016 void std::_Destroy<int*>(int*, int*) [53]

               0.00    0.00  400016/400016      void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [49]

               0.00    0.00  400016/400016      std::vector<int, std::allocator<int> >::~vector() [22]

[54] 0.0 0.00 0.00 400016 void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&) [54]

               0.00    0.00  400016/400016      void std::_Destroy<int*>(int*, int*) [53]

               0.00    0.00  400015/400015      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

[55] 0.0 0.00 0.00 400015 __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [55]


               0.00    0.00  400015/400015      std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [3]

[56] 0.0 0.00 0.00 400015 std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [56]


               0.00    0.00  400012/400012      merge_sort(std::vector<int, std::allocator<int> >) [2]

[57] 0.0 0.00 0.00 400012 unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&) [57]


               0.00    0.00  200006/200006      std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60]

[58] 0.0 0.00 0.00 200006 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [58]


               0.00    0.00  200006/200006      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]

[59] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::capacity() const [59]


               0.00    0.00  200006/200006      std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64]

[60] 0.0 0.00 0.00 200006 std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60]

               0.00    0.00  200006/200006      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [58]

               0.00    0.00  200006/200006      void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65]

[61] 0.0 0.00 0.00 200006 void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [61]


               0.00    0.00  200006/200006      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]

[62] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::end() [62]

               0.00    0.00  200006/600018      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42]

               0.00    0.00  200006/200006      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]

[63] 0.0 0.00 0.00 200006 std::vector<int, std::allocator<int> >::begin() [63]

               0.00    0.00  200006/600018      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [42]

               0.00    0.00  200006/200006      __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [15]

[64] 0.0 0.00 0.00 200006 std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64]

               0.00    0.00  200006/200006      std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [60]

               0.00    0.00  200006/200006      void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66]

[65] 0.0 0.00 0.00 200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65]

               0.00    0.00  200006/200006      void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [61]

               0.00    0.00  200006/200006      std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) [14]

[66] 0.0 0.00 0.00 200006 void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&) [66]

               0.00    0.00  200006/200006      void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [65]

               0.00    0.00       1/1           __libc_csu_init [90]

[67] 0.0 0.00 0.00 1 _GLOBAL__sub_I_main [67]

               0.00    0.00       1/1           __static_initialization_and_destruction_0(int, int) [68]

               0.00    0.00       1/1           _GLOBAL__sub_I_main [67]

[68] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [68]


               0.00    0.00       1/1           std::allocator<int>::allocator() [70]

[69] 0.0 0.00 0.00 1 __gnu_cxx::new_allocator<int>::new_allocator() [69]


               0.00    0.00       1/1           main [1]

[70] 0.0 0.00 0.00 1 std::allocator<int>::allocator() [70]

               0.00    0.00       1/1           __gnu_cxx::new_allocator<int>::new_allocator() [69]

               0.00    0.00       1/1           void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74]

[71] 0.0 0.00 0.00 1 void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71]

               0.00    0.00       1/1           int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76]

               0.00    0.00       1/1           std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [26]

[72] 0.0 0.00 0.00 1 std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72]

               0.00    0.00       1/1000038     std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [40]
               0.00    0.00       1/1           void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75]

               0.00    0.00       1/1           int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76]

[73] 0.0 0.00 0.00 1 __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) [73]


               0.00    0.00       1/1           void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75]

[74] 0.0 0.00 0.00 1 void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74]

               0.00    0.00       1/1           void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71]

               0.00    0.00       1/1           std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [72]

[75] 0.0 0.00 0.00 1 void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&) [75]

               0.00    0.00       1/1           void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [74]

               0.00    0.00       1/1           void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [71]

[76] 0.0 0.00 0.00 1 int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [76]

               0.00    0.00       1/400016      std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*) [52]
               0.00    0.00       1/1           __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&) [73]

� Index by function name

 [67] _GLOBAL__sub_I_main    [44] std::allocator<int>::~allocator() [22] std::vector<int, std::allocator<int> >::~vector()
 [25] check_sort(std::vector<int, std::allocator<int> >) [35] std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, false>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [14] std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&)
  [2] merge_sort(std::vector<int, std::allocator<int> >) [36] std::_Iter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) [17] std::vector<int, std::allocator<int> >::operator[](unsigned long)
 [68] __static_initialization_and_destruction_0(int, int) [60] std::_Iter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, true>::_S_base(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [73] __gnu_cxx::__enable_if<std::__is_scalar<int>::__value, int*>::__type std::__fill_n_a<int*, unsigned long, int>(int*, unsigned long, int const&)
  [4] merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int) [48] std::_Iter_base<int*, false>::_S_base(int*) [37] std::_Miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >)
 [24] __gnu_cxx::new_allocator<int>::deallocate(int*, unsigned long) [11] int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*) [38] std::_Niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >)
 [18] __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) [61] void std::_Destroy_aux<true>::__destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >) [64] std::_Niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 [45] __gnu_cxx::new_allocator<int>::new_allocator(__gnu_cxx::new_allocator<int> const&) [49] void std::_Destroy_aux<true>::__destroy<int*>(int*, int*) [52] std::_Niter_base<int*>::iterator_type std::__niter_base<int*>(int*)
 [69] __gnu_cxx::new_allocator<int>::new_allocator() [19] std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) [7] int* std::__copy_move_a<false, int const*, int*>(int const*, int const*, int*)
 [43] __gnu_cxx::new_allocator<int>::~new_allocator() [50] std::_Vector_base<int, std::allocator<int> >::_Vector_impl::_Vector_impl(std::allocator<int> const&) [15] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 [55] __gnu_cxx::__alloc_traits<std::allocator<int> >::_S_select_on_copy(std::allocator<int> const&) [51] std::_Vector_base<int, std::allocator<int> >::_Vector_impl::~_Vector_impl() [9] int* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*)
 [33] __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&) [20] std::_Vector_base<int, std::allocator<int> >::_M_deallocate(int*, unsigned long) [6] int* std::uninitialized_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*)
 [42] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) [13] std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) [74] void std::uninitialized_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&)
 [46] __gnu_cxx::new_allocator<int>::max_size() const [40] std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() [5] int* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, int>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*, std::allocator<int>&)
 [34] __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::base() const [12] std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) [75] void std::__uninitialized_fill_n_a<int*, unsigned long, int, int>(int*, unsigned long, int const&, std::allocator<int>&)
 [58] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::base() const [21] std::_Vector_base<int, std::allocator<int> >::~_Vector_base() [57] unsigned int const& std::min<unsigned int>(unsigned int const&, unsigned int const&)
 [56] std::_Vector_base<int, std::allocator<int> >::_M_get_Tp_allocator() const [8] int* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*) [16] __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 [41] std::vector<int, std::allocator<int> >::end() const [71] void std::__uninitialized_fill_n<true>::__uninit_fill_n<int*, unsigned long, int>(int*, unsigned long, int const&) [10] int* std::copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*>(__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, int*)
 [39] std::vector<int, std::allocator<int> >::size() const [72] std::vector<int, std::allocator<int> >::_M_fill_initialize(unsigned long, int const&) [76] int* std::fill_n<int*, unsigned long, int>(int*, unsigned long, int const&)
 [23] std::vector<int, std::allocator<int> >::begin() const [62] std::vector<int, std::allocator<int> >::end() [65] void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
 [59] std::vector<int, std::allocator<int> >::capacity() const [63] std::vector<int, std::allocator<int> >::begin() [66] void std::_Destroy<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, std::allocator<int>&)
 [47] std::allocator<int>::allocator(std::allocator<int> const&) [3] std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) [53] void std::_Destroy<int*>(int*, int*)
 [70] std::allocator<int>::allocator() [26] std::vector<int, std::allocator<int> >::vector(unsigned long, int const&, std::allocator<int> const&) [54] void std::_Destroy<int*, int>(int*, int*, std::allocator<int>&)



Analysis
 21.90      0.07     0.07   600021     0.00     0.00  int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*)
 15.64      0.12     0.05   200006     0.00     0.00  merge(std::vector<int, std::allocator<int> >, unsigned int, unsigned int, unsigned int)

From the flat profile,the hotspots for this program would be the function merge function whih was called 200,006 times and thee int* std::__copy__move. The merge function would take up 52.3% of the time


Assignment 2

After carefully weighing the projects from assignment 1, we first decided to parallelize the neural network. However, on further inspection of the source code, it was found that the network had a very low accuracy even after ten thousand iteration. To improve the algorithm first and then parallelize did not sound very pragmatic. So we decided with the next most interesting topic for the group, which was merge sort. Here is how the code was changed to its parallel version.

CUDA enabled functions

The main function was changed to perform the copying of data from host to device, launch the kernel, copy back results from the device to host and release all memory, on host and device.

 cudaMalloc((void**)&d_A, (size) * sizeof(long));
 cudaMalloc((void**)&d_B, (size) * sizeof(long));
 cudaMemcpy(d_A, h_A, size * sizeof(long), cudaMemcpyHostToDevice);
 cudaMalloc((void**)&d_ntpb, sizeof(dim3));
 cudaMalloc((void**)&d_nbpg, sizeof(dim3));
 cudaMemcpy(d_ntpb, &ntpb, sizeof(dim3), cudaMemcpyHostToDevice);
 cudaMemcpy(d_nbpg, &nbpg, sizeof(dim3), cudaMemcpyHostToDevice);
 long *d_actual = d_A;
 long *d_swap = d_B;
 long totalThreads = ntpb.x * nbpg.x;
 float start_time = clock();
 for (int width = 2; width < (size << 1); width <<= 1) {
      long slices = size / (totalThreads * width) + 1;
      merge_sort << <nbpg, ntpb >> > (d_actual, d_swap, size, width, slices, d_ntpb, d_nbpg);
      cudaDeviceSynchronize();
      d_actual = d_actual == d_A ? d_B : d_A;
      d_swap = d_swap == d_A ? d_B : d_A;
 }
 float end_time = clock() - start_time;
 //copy back the device array to host array
 cudaMemcpy(h_C, d_actual, size * sizeof(long), cudaMemcpyDeviceToHost);

The serial code did not use recusion, which is usually utilized for merge sort. Instead, it did a bottoms-up merge, wherein the array was sliced into pieces and each piece was sorted. The width of each slice changed with each iteration. This was done using two functions int the serial code: merge(...) and merge_sort(...). The former function was responsible for slicing the array into different widths. The latter received this slices and sorted them by creating a local STL map. In the CUDA code for this, it seemed advisable to make merge_sort as the kernel. This can be justified because each thread now gets a slice of the array to sort, which means more work is done concurrently than before. Also, this enabled removal of the two nested for-loops from the serial code. Here is how the new merge_sort function looks like:

__global__ void merge_sort(long *sourceA, long *destA, long size, long width, long slices, dim3 *threads, dim3 *blocks) {
       int index = threadIdx.x + blockIdx.x * blockDim.x; //may need to change this
       int beg = width * index * slices, middle, end;
       for (long slice = 0; slice < slices; slice++) { //basically removed the two for loops
           if (beg < size) {
               middle = min(beg + (width >> 1), size);
               end = min(beg + width, size);
               merge(sourceA, destA, beg, middle, end);
               beg += width;
           }
       }
}

As previously, the merge_sort() function calls the merge function. In the serial code, an STL map is utilized to store the sliced array's values that are later used as values to be assigned to the original array based on the index comparison or value comparison. Now, the merge function is strictly a device function that is called from the kernel for each slice to be sorted.

__device__ void merge(long *sourceA, long *destA, long b, long c, long e) {
 long i1 = b;
 long i2 = c;
 for (long k = b; k < e; k++) {
     if (i1 < c && (i2 >= e || sourceA[i1] < sourceA[i2])) {
         destA[k] = sourceA[i1];
         i1++;
     }
     else {
         destA[k] = sourceA[i2];
         i2++;
     }
  }
}

Profiling results

SerialvParallel.jpg

Assignment 3